Model Library/Llama 4 Maverick Instruct
meta-llama/llama-4-maverick-17b-128e-instruct-fp8

Llama 4 Maverick Instruct

meta-llama/llama-4-maverick-17b-128e-instruct-fp8
Llama 4 Maverick 17B Instruct (128E) is a high-capacity multimodal language model from Meta, built on a mixture-of-experts (MoE) architecture with 128 experts and 17 billion active parameters per forward pass (400B total). It supports multilingual text and image input, and produces multilingual text and code output across 12 supported languages. Optimized for vision-language tasks, Maverick is instruction-tuned for assistant-like behavior, image reasoning, and general-purpose multimodal interaction. Maverick features early fusion for native multimodality and a 1 million token context window. It was trained on a curated mixture of public, licensed, and Meta-platform data, covering ~22 trillion tokens, with a knowledge cutoff in August 2024. Released on April 5, 2025 under the Llama 4 Community License, Maverick is suited for research and commercial applications requiring advanced multimodal understanding and high model throughput.

Features

Serverless API

Docs

meta-llama/llama-4-maverick-17b-128e-instruct-fp8 is available via Novita's serverless API, where you pay per token. There are several ways to call the API, including OpenAI-compatible endpoints with exceptional reasoning performance.

On-demand Deployments

Docs

On-demand deployments allow you to use meta-llama/llama-4-maverick-17b-128e-instruct-fp8 on dedicated GPUs with high-performance serving stack with high reliability and no rate limits.

Available Serverless

Run queries immediately, pay only for usage

Input$0.17 / M Tokens
Output$0.85 / M Tokens

Use the following code examples to integrate with our API:

1from openai import OpenAI
2
3client = OpenAI(
4    api_key="<Your API Key>",
5    base_url="https://api.novita.ai/openai"
6)
7
8response = client.chat.completions.create(
9    model="meta-llama/llama-4-maverick-17b-128e-instruct-fp8",
10    messages=[
11        {"role": "system", "content": "You are a helpful assistant."},
12        {"role": "user", "content": "Hello, how are you?"}
13    ],
14    max_tokens=8192,
15    temperature=0.7
16)
17
18print(response.choices[0].message.content)

Info

Provider
Llama
Quantization
fp8

Supported Functionality

Context Length
1048576
Max Output
8192
Input Capabilities
text
Output Capabilities
text