Loading models...
Gemma
Authors: Google DeepMind
Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.
Gemma 3 models are multimodal, handling text and image input and generating text output, with open weights for both pre-trained variants and instruction-tuned variants.
Applications:
Question answering, summarization, reasoning, image understanding, and more.
Input:
Output:
1pip install -U transformers
1from transformers import pipeline 2import torch 3 4pipe = pipeline( 5 "image-text-to-text", 6 model="google/gemma-3-27b-it", 7 device="cuda", 8 torch_dtype=torch.bfloat16 9) 10 11messages = [ 12 { 13 "role": "system", 14 "content": [{"type": "text", "text": "You are a helpful assistant."}] 15 }, 16 { 17 "role": "user", 18 "content": [ 19 {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, 20 {"type": "text", "text": "What animal is on the candy?"} 21 ] 22 } 23] 24 25output = pipe(text=messages, max_new_tokens=200) 26print(output[0]["generated_text"][-1]["content"])
Example Output:
Based on the image, the animal on the candy is a turtle.
1# pip install accelerate 2from transformers import AutoProcessor, Gemma3ForConditionalGeneration 3from PIL import Image 4import requests 5import torch 6 7model_id = "google/gemma-3-27b-it" 8 9model = Gemma3ForConditionalGeneration.from_pretrained( 10 model_id, device_map="auto" 11).eval() 12 13processor = AutoProcessor.from_pretrained(model_id) 14 15messages = [ 16 { 17 "role": "system", 18 "content": [{"type": "text", "text": "You are a helpful assistant."}] 19 }, 20 { 21 "role": "user", 22 "content": [ 23 {"type": "image", "image": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg"}, 24 {"type": "text", "text": "Describe this image in detail."} 25 ] 26 } 27] 28 29inputs = processor.apply_chat_template( 30 messages, add_generation_prompt=True, tokenize=True, 31 return_dict=True, return_tensors="pt" 32).to(model.device, dtype=torch.bfloat16) 33 34input_len = inputs["input_ids"].shape[-1] 35 36with torch.inference_mode(): 37 generation = model.generate(**inputs, max_new_tokens=100, do_sample=False) 38 generation = generation[0][input_len:] 39 40decoded = processor.decode(generation, skip_special_tokens=True) 41print(decoded)
1@article{gemma_2025, 2 title={Gemma 3}, 3 url={https://goo.gle/Gemma3Report}, 4 publisher={Kaggle}, 5 author={Gemma Team}, 6 year={2025} 7}
| Model Size | Tokens Used for Training | 
|---|---|
| 27B | 14 trillion | 
| 12B | 12 trillion | 
| 4B | 4 trillion | 
| 1B | 2 trillion | 
| Benchmark | Metric | 1B | 4B | 12B | 27B | 
|---|---|---|---|---|---|
| HellaSwag | 10-shot | 62.3 | 77.2 | 84.2 | 85.6 | 
| BoolQ | 0-shot | 63.2 | 72.3 | 78.8 | 82.4 | 
| PIQA | 0-shot | 73.8 | 79.6 | 81.8 | 83.3 | 
| SocialIQA | 0-shot | 48.9 | 51.9 | 53.4 | 54.9 | 
| TriviaQA | 5-shot | 39.8 | 65.8 | 78.2 | 85.5 | 
| Natural Questions | 5-shot | 9.48 | 20.0 | 31.4 | 36.1 | 
| ARC-c | 25-shot | 38.4 | 56.2 | 68.9 | 70.6 | 
| ARC-e | 0-shot | 73.0 | 82.4 | 88.3 | 89.0 | 
| WinoGrande | 5-shot | 58.2 | 64.7 | 74.3 | 78.8 | 
| BIG-Bench Hard | few-shot | 28.4 | 50.9 | 72.6 | 77.7 | 
| DROP | 1-shot | 42.4 | 60.1 | 72.2 | 77.2 | 
| Benchmark | Metric | 4B | 12B | 27B | 
|---|---|---|---|---|
| MMLU | 5-shot | 59.6 | 74.5 | 78.6 | 
| AGIEval | 3-5-shot | 42.1 | 57.4 | 66.2 | 
| GSM8K | 8-shot | 38.4 | 71.0 | 82.6 | 
| HumanEval | 0-shot | 36.0 | 45.7 | 48.8 | 
| Benchmark | 1B | 4B | 12B | 27B | 
|---|---|---|---|---|
| MGSM | 2.04 | 34.7 | 64.3 | 74.3 | 
| Global-MMLU-Lite | 24.9 | 57.0 | 69.4 | 75.7 | 
| FloRes | 29.5 | 39.2 | 46.0 | 48.8 | 
| Benchmark | 4B | 12B | 27B | 
|---|---|---|---|
| COCOcap | 102 | 111 | 116 | 
| DocVQA (val) | 72.8 | 82.3 | 85.6 | 
| TextVQA (val) | 58.9 | 66.5 | 68.6 | 
| RealWorldQA | 45.5 | 52.2 | 53.9 |