Qwen3 235B A22B

qwen/qwen3-235b-a22b-fp8

Achieves effective integration of inference and non-inference modes, enabling seamless switching between modes during conversations. The model's inference capability significantly surpasses that of QwQ, and its general capabilities exceed those of Qwen2.5-72B-Instruct, reaching the state-of-the-art (SOTA) level among models of the same scale.

Funktionen

Serverless API

Dokumentation

qwen/qwen3-235b-a22b-fp8 is available via Novita's serverless API, where you pay per token. There are several ways to call the API, including OpenAI-compatible endpoints with exceptional reasoning performance.

On-Demand-Bereitstellungen

Dokumentation

On-demand deployments allow you to use qwen/qwen3-235b-a22b-fp8 on dedicated GPUs with high-performance serving stack with high reliability and no rate limits.

Verfügbare Serverless

Abfragen sofort ausführen, nur für die Nutzung bezahlen

Eingabe$0.2 / M Tokens

Ausgabe$0.8 / M Tokens

Verwenden Sie die folgenden Codebeispiele, um unsere API zu integrieren:

1from openai import OpenAI
2
3client = OpenAI(
4    api_key="<Your API Key>",
5    base_url="https://api.novita.ai/openai"
6)
7
8response = client.chat.completions.create(
9    model="qwen/qwen3-235b-a22b-fp8",
10    messages=[
11        {"role": "system", "content": "You are a helpful assistant."},
12        {"role": "user", "content": "Hello, how are you?"}
13    ],
14    max_tokens=20000,
15    temperature=0.7
16)
17
18print(response.choices[0].message.content)

Info

Anbieter

Qwen

Quantisierung

fp8

Unterstützte Funktionalität

Kontextlänge

40960

Maximale Ausgabe

20000

Serverless

Unterstützt

Structured Output

Unterstützt

Reasoning

Unterstützt

Eingabefähigkeiten

text

Ausgabefähigkeiten

text