Authors : Google
Summary description and brief definition of inputs and outputs.
Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights for both pre-trained variants and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as a laptop, desktop or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone.
You can choose 3 programming languages to access our google/gemma-2-9b-it model.
We provide compatibility with the OpenAI API standard
The API Base URL
1https://api.novita.ai/v3/openai
Example of Using Chat Completions API
Generate a response using a list of messages from a conversation
1# Get the Novita AI API Key by referring to: https://novita.ai/docs/get-started/quickstart.html#_2-manage-api-key 2export API_KEY="{YOUR Novita AI API Key}" 3 4curl "https://api.novita.ai/v3/openai/chat/completions" \ 5 -H "Content-Type: application/json" \ 6 -H "Authorization: Bearer ${API_KEY}" \ 7 -d '{ 8 "model": "google/gemma-2-9b-it", 9 "messages": [ 10 { 11 "role": "system", 12 "content": "Act like you are a helpful assistant." 13 }, 14 { 15 "role": "user", 16 "content": "Hi there!" 17 } 18 ], 19 "max_tokens": 512 20}'
The response may look like this
1{ 2 "id": "chat-5f461a9a23a44ef29dbd3124b891afc0", 3 "object": "chat.completion", 4 "created": 1731584707, 5 "model": "google/gemma-2-9b-it", 6 "choices": [ 7 { 8 "index": 0, 9 "message": { 10 "role": "assistant", 11 "content": "Hello! It's nice to meet you. How can I assist you today? Do you have any questions or topics you'd like to discuss? I'm here to help with anything you need." 12 }, 13 "finish_reason": "stop", 14 "content_filter_results": { 15 "hate": { "filtered": false }, 16 "self_harm": { "filtered": false }, 17 "sexual": { "filtered": false }, 18 "violence": { "filtered": false }, 19 "jailbreak": { "filtered": false, "detected": false }, 20 "profanity": { "filtered": false, "detected": false } 21 } 22 } 23 ], 24 "usage": { 25 "prompt_tokens": 46, 26 "completion_tokens": 40, 27 "total_tokens": 86, 28 "prompt_tokens_details": null, 29 "completion_tokens_details": null 30 }, 31 "system_fingerprint": "" 32}
If you want to receive a response via streaming, simply pass "stream": true
in the request (see the difference on line 20). An example is provided.
1# Get the Novita AI API Key by referring to: https://novita.ai/docs/get-started/quickstart.html#_2-manage-api-key 2export API_KEY="{YOUR Novita AI API Key}" 3 4curl "https://api.novita.ai/v3/openai/chat/completions" \ 5 -H "Content-Type: application/json" \ 6 -H "Authorization: Bearer ${API_KEY}" \ 7 -d '{ 8 "model": "google/gemma-2-9b-it", 9 "messages": [ 10 { 11 "role": "system", 12 "content": "Act like you are a helpful assistant." 13 }, 14 { 15 "role": "user", 16 "content": "Hi there!" 17 } 18 ], 19 "max_tokens": 512, 20 "stream": true 21}'
The response may look like this
1data: {"id":"chat-d821b951d6ff43ab838d18137aef7d0a","object":"chat.completion.chunk","created":1731586102,"model":"meta-llama/llama-3.1-8b-instruct","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null,"content_filter_results":{"hate":{"filtered":false},"self_harm":{"filtered":false},"sexual":{"filtered":false},"violence":{"filtered":false},"jailbreak":{"filtered":false,"detected":false},"profanity":{"filtered":false,"detected":false}}}],"system_fingerprint":""} 2 3... 4 5data: {"id":"chat-d821b951d6ff43ab838d18137aef7d0a","object":"chat.completion.chunk","created":1731586102,"model":"meta-llama/llama-3.1-8b-instruct","choices":[{"index":0,"delta":{"content":"n, ne"},"finish_reason":null,"content_filter_results":{"hate":{"filtered":false},"self_harm":{"filtered":false},"sexual":{"filtered":false},"violence":{"filtered":false},"jailbreak":{"filtered":false,"detected":false},"profanity":{"filtered":false,"detected":false}}}],"system_fingerprint":""} 6 7data: {"id":"chat-d821b951d6ff43ab838d18137aef7d0a","object":"chat.completion.chunk","created":1731586102,"model":"meta-llama/llama-3.1-8b-instruct","choices":[{"index":0,"delta":{"content":"ed"},"finish_reason":null,"content_filter_results":{"hate":{"filtered":false},"self_harm":{"filtered":false},"sexual":{"filtered":false},"violence":{"filtered":false},"jailbreak":{"filtered":false,"detected":false},"profanity":{"filtered":false,"detected":false}}}],"system_fingerprint":""} 8 9data: {"id":"chat-d821b951d6ff43ab838d18137aef7d0a","object":"chat.completion.chunk","created":1731586102,"model":"meta-llama/llama-3.1-8b-instruct","choices":[{"index":0,"delta":{"content":" assi"},"finish_reason":null,"content_filter_results":{"hate":{"filtered":false},"self_harm":{"filtered":false},"sexual":{"filtered":false},"violence":{"filtered":false},"jailbreak":{"filtered":false,"detected":false},"profanity":{"filtered":false,"detected":false}}}],"system_fingerprint":""} 10 11data: {"id":"chat-d821b951d6ff43ab838d18137aef7d0a","object":"chat.completion.chunk","created":1731586102,"model":"meta-llama/llama-3.1-8b-instruct","choices":[{"index":0,"delta":{"content":"s"},"finish_reason":null,"content_filter_results":{"hate":{"filtered":false},"self_harm":{"filtered":false},"sexual":{"filtered":false},"violence":{"filtered":false},"jailbreak":{"filtered":false,"detected":false},"profanity":{"filtered":false,"detected":false}}}],"system_fingerprint":""} 12 13data: {"id":"chat-d821b951d6ff43ab838d18137aef7d0a","object":"chat.completion.chunk","created":1731586102,"model":"meta-llama/llama-3.1-8b-instruct","choices":[{"index":0,"delta":{"content":"tan"},"finish_reason":null,"content_filter_results":{"hate":{"filtered":false},"self_harm":{"filtered":false},"sexual":{"filtered":false},"violence":{"filtered":false},"jailbreak":{"filtered":false,"detected":false},"profanity":{"filtered":false,"detected":false}}}],"system_fingerprint":""} 14 15data: {"id":"chat-d821b951d6ff43ab838d18137aef7d0a","object":"chat.completion.chunk","created":1731586102,"model":"meta-llama/llama-3.1-8b-instruct","choices":[{"index":0,"delta":{"content":"ce wi"},"finish_reason":null,"content_filter_results":{"hate":{"filtered":false},"self_harm":{"filtered":false},"sexual":{"filtered":false},"violence":{"filtered":false},"jailbreak":{"filtered":false,"detected":false},"profanity":{"filtered":false,"detected":false}}}],"system_fingerprint":""} 16 17... 18 19data: {"id":"chat-d821b951d6ff43ab838d18137aef7d0a","object":"chat.completion.chunk","created":1731586102,"model":"meta-llama/llama-3.1-8b-instruct","choices":[{"index":0,"delta":{"content":" "},"finish_reason":null,"content_filter_results":{"hate":{"filtered":false},"self_harm":{"filtered":false},"sexual":{"filtered":false},"violence":{"filtered":false},"jailbreak":{"filtered":false,"detected":false},"profanity":{"filtered":false,"detected":false}}}],"system_fingerprint":""} 20 21data: {"id":"chat-d821b951d6ff43ab838d18137aef7d0a","object":"chat.completion.chunk","created":1731586102,"model":"meta-llama/llama-3.1-8b-instruct","choices":[{"index":0,"delta":{"content":"just want to chat?"},"finish_reason":"stop","content_filter_results":{"hate":{"filtered":false},"self_harm":{"filtered":false},"sexual":{"filtered":false},"violence":{"filtered":false},"jailbreak":{"filtered":false,"detected":false},"profanity":{"filtered":false,"detected":false}}}],"system_fingerprint":""} 22 23data: [DONE]
Model Parameters
Feel free to check out our documentation for more details.
First, install the official OpenAI Python client
1pip install 'openai>=1.0.0'
and then you can run inferences with us
Example of Using Chat Completions API
Generate a response using a list of messages from a conversation
1from openai import OpenAI 2 3client = OpenAI( 4 base_url="https://api.novita.ai/v3/openai", 5 # Get the Novita AI API Key by referring to: https://novita.ai/docs/get-started/quickstart.html#_2-manage-api-key. 6 api_key="<YOUR Novita AI API Key>", 7) 8 9model = "google/gemma-2-9b-it" 10stream = True # or False 11max_tokens = 512 12 13chat_completion_res = client.chat.completions.create( 14 model=model, 15 messages=[ 16 { 17 "role": "system", 18 "content": "Act like you are a helpful assistant.", 19 }, 20 { 21 "role": "user", 22 "content": "Hi there!", 23 } 24 ], 25 stream=stream, 26 max_tokens=max_tokens, 27) 28 29if stream: 30 for chunk in chat_completion_res: 31 print(chunk.choices[0].delta.content or "") 32else: 33 print(chat_completion_res.choices[0].message.content)
If you set stream: true
(line 10), the print may look like this
1It' 2s 3 ni 4ce to 5meet you. 6Is 7 the 8re so 9meth 10ing I 11 can h 12e 13lp 14you wi 15th t 16oday, 17 or 18 woul 19d 20 you like to chat?
If you don't want to receive a response via streaming, simply set stream: false
. The output will look like this
1How can I assist you today? Do you have any questions or topics you'd like to discuss?
Model Parameters
Feel free to check out our documentation for more details.
First, install the official OpenAI JavaScript client
1npm install openai
and then you can run inferences with us in the browser or in node.js
Example of Using Chat Completions API
Generate a response using a list of messages from a conversation
1import OpenAI from "openai"; 2 3const openai = new OpenAI({ 4 baseURL: "https://api.novita.ai/v3/openai", 5 apiKey: "<YOUR Novita AI API Key>", 6}); 7const stream = true; // or false 8 9async function run() { 10 const completion = await openai.chat.completions.create({ 11 messages: [ 12 { 13 role: "system", 14 content: "Act like you are a helpful assistant.", 15 }, 16 { 17 role: "user", 18 content: "Hi there!" 19 } 20 ], 21 model: "google/gemma-2-9b-it", 22 stream 23 }); 24 25 if (stream) { 26 for await (const chunk of completion) { 27 if (chunk.choices[0].finish_reason) { 28 console.log(chunk.choices[0].finish_reason); 29 } else { 30 console.log(chunk.choices[0].delta.content); 31 } 32 } 33 } else { 34 console.log(JSON.stringify(completion)); 35 } 36} 37 38run();
If you set stream: true
(line 7), the print may look like this
1It' 2s 3 nic 4e to 5 m 6eet you 7. Ho 8w can 9I 10 as 11sist 12 you 13toda 14y? Do you 15hav 16e any q 17uest 18io 19ns or 20 to 21pics you 22' 23d 24li 25ke to 26 di 27scuss 28stop
If you don't want to receive a response via streaming, simply set stream: false
. The output will look like this
1{ 2 "id": "chat-a3ff0e39b4c24abcbd258ab1a1f38db9", 3 "object": "chat.completion", 4 "created": 1731642457, 5 "model": "google/gemma-2-9b-it", 6 "choices": [ 7 { 8 "index": 0, 9 "message": { 10 "role": "assistant", 11 "content": "How can I help you today? Would you like to talk about something specific or just have a chat? I'm here to assist you with any questions or information you might need." 12 }, 13 "finish_reason": "stop", 14 "content_filter_results": { 15 "hate": { "filtered": false }, 16 "self_harm": { "filtered": false }, 17 "sexual": { "filtered": false }, 18 "violence": { "filtered": false }, 19 "jailbreak": { "filtered": false, "detected": false }, 20 "profanity": { "filtered": false, "detected": false } 21 } 22 } 23 ], 24 "usage": { 25 "prompt_tokens": 46, 26 "completion_tokens": 37, 27 "total_tokens": 83, 28 "prompt_tokens_details": null, 29 "completion_tokens_details": null 30 }, 31 "system_fingerprint": "" 32}
Model Parameters
Feel free to check out our documentation for more details.
Data used for model training and how the data was processed.
These models were trained on a dataset of text data that includes a wide variety of sources. The 27B model was trained with 13 trillion tokens and the 9B model was trained with 8 trillion tokens. Here are the key components:
The combination of these diverse data sources is crucial for training a powerful language model that can handle a wide variety of different tasks and text formats.
Here are the key data cleaning and filtering methods applied to the training data:
Details about the model internals.
Gemma was trained using the latest generation of Tensor Processing Unit (TPU) hardware (TPUv5p).
Training large language models requires significant computational power. TPUs, designed specifically for matrix operations common in machine learning, offer several advantages in this domain:
Training was done using JAX and ML Pathways.
JAX allows researchers to take advantage of the latest generation of hardware, including TPUs, for faster and more efficient training of large models.
ML Pathways is Google's latest effort to build artificially intelligent systems capable of generalizing across multiple tasks. This is specially suitable for foundation models, including large language models like these ones.
Together, JAX and ML Pathways are used as described in the paper about the Gemini family of models; "the 'single controller' programming model of Jax and Pathways allows a single Python process to orchestrate the entire training run, dramatically simplifying the development workflow."
Model evaluation metrics and results.
These models were evaluated against a large collection of different datasets and metrics to cover different aspects of text generation:
Benchmark | Metric | Gemma PT 9B | Gemma PT 27B |
---|---|---|---|
MMLU | 5-shot, top-1 | 71.3 | 75.2 |
HellaSwag | 10-shot | 81.9 | 86.4 |
PIQA | 0-shot | 81.7 | 83.2 |
SocialIQA | 0-shot | 53.4 | 53.7 |
BoolQ | 0-shot | 84.2 | 84.8 |
WinoGrande | partial score | 80.6 | 83.7 |
ARC-e | 0-shot | 88 | 88.6 |
ARC-c | 25-shot | 68.4 | 71.4 |
TriviaQA | 5-shot | 76.6 | 83.7 |
Natural Questions | 5-shot | 29.2 | 34.5 |
HumanEval | pass@1 | 40.2 | 51.8 |
MBPP | 3-shot | 52.4 | 62.6 |
GSM8K | 5-shot, maj@1 | 68.6 | 74 |
MATH | 4-shot | 36.6 | 42.3 |
AGIEval | 3-5-shot | 52.8 | 55.1 |
BIG-Bench | 3-shot, CoT | 68.2 | 74.9 |
------------------------------ | ------------- | ----------- | ------------ |
Ethics and safety evaluation approach and results.
Our evaluation methods include structured evaluations and internal red-teaming testing of relevant content policies. Red-teaming was conducted by a number of different teams, each with different goals and human evaluation metrics. These models were evaluated against a number of different categories relevant to ethics and safety, including:
The results of ethics and safety evaluations are within acceptable thresholds for meeting internal policies for categories such as child safety, content safety, representational harms, memorization, large-scale harms. On top of robust internal evaluations, the results of well-known safety benchmarks like BBQ, BOLD, Winogender, Winobias, RealToxicity, and TruthfulQA are shown here.
Gemma 2.0
Benchmark | Metric | Gemma 2 IT 9B | Gemma 2 IT 27B |
---|---|---|---|
RealToxicity | average | 8.25 | 8.84 |
CrowS-Pairs | top-1 | 37.47 | 36.67 |
BBQ Ambig | 1-shot, top-1 | 88.58 | 85.99 |
BBQ Disambig | top-1 | 82.67 | 86.94 |
Winogender | top-1 | 79.17 | 77.22 |
TruthfulQA | 50.27 | 51.6 | |
Winobias 1_2 | 78.09 | 81.94 | |
Winobias 2_2 | 95.32 | 97.22 | |
Toxigen | 39.3 | 38.42 | |
------------------------ | ------------- | --------------- | ---------------- |
These models have certain limitations that users should be aware of.
Open Large Language Models (LLMs) have a wide range of applications across various industries and domains. The following list of potential uses is not comprehensive. The purpose of this list is to provide contextual information about the possible use-cases that the model creators considered as part of model training and development.
The development of large language models (LLMs) raises several ethical concerns. In creating an open model, we have carefully considered the following:
Risks identified and mitigations:
At the time of release, this family of models provides high-performance open large language model implementations designed from the ground up for Responsible AI development compared to similarly sized models.
Using the benchmark evaluation metrics described in this document, these models have shown to provide superior performance to other, comparably-sized open model alternatives.