Model Library/Qwen3 Next 80B A3B Thinking
Qwen

Qwen3 Next 80B A3B Thinking

qwen/qwen3-next-80b-a3b-thinking
Qwen3-Next uses a highly sparse MoE design: 80B total parameters, but only ~3B activated per inference step. Experiments show that, with global load balancing, increasing total expert parameters while keeping activated experts fixed steadily reduces training loss.Compared to Qwen3’s MoE (128 total experts, 8 routed), Qwen3-Next expands to 512 total experts, combining 10 routed experts + 1 shared expert — maximizing resource usage without hurting performance. The Qwen3-Next-80B-A3B-Thinking excels at complex reasoning tasks — outperforming higher-cost models like Qwen3-30B-A3B-Thinking-2507 and Qwen3-32B-Thinking, outpeforming the closed-source Gemini-2.5-Flash-Thinking on multiple benchmarks, and approaching the performance of our top-tier model Qwen3-235B-A22B-Thinking-2507.

Fonctionnalités

API sans serveur

Documentation

qwen/qwen3-next-80b-a3b-thinking is available via Novita's serverless API, where you pay per token. There are several ways to call the API, including OpenAI-compatible endpoints with exceptional reasoning performance.

Déploiements à la demande

Documentation

On-demand deployments allow you to use qwen/qwen3-next-80b-a3b-thinking on dedicated GPUs with high-performance serving stack with high reliability and no rate limits.

Sans serveur disponible

Exécutez des requêtes immédiatement, ne payez que pour l’utilisation

Entrée$0.15 / M Tokens
Sortie$1.5 / M Tokens

Utilisez les exemples de code suivants pour intégrer notre API :

1from openai import OpenAI
2
3client = OpenAI(
4    api_key="<Your API Key>",
5    base_url="https://api.novita.ai/openai"
6)
7
8response = client.chat.completions.create(
9    model="qwen/qwen3-next-80b-a3b-thinking",
10    messages=[
11        {"role": "system", "content": "You are a helpful assistant."},
12        {"role": "user", "content": "Hello, how are you?"}
13    ],
14    max_tokens=32768,
15    temperature=0.7
16)
17
18print(response.choices[0].message.content)

Infos

Fournisseur
Qwen
Quantification
bf16

Fonctionnalités prises en charge

Longueur du contexte
131072
Sortie maximale
32768
Serverless
Pris en charge
Function Calling
Pris en charge
Reasoning
Pris en charge
API Anthropic
Pris en charge
Capacités d’entrée
text
Capacités de sortie
text