🚀 Ship AI models, agents, and workloads faster — with up to 20% OFF across the platform

Don't show again

Model Library/Qwen2.5 7B Instruct

Qwen2.5 7B Instruct

20% OFF

qwen/qwen2.5-7b-instruct

Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2: - Significantly more knowledge and has greatly improved capabilities in coding and mathematics, thanks to our specialized expert models in these domains. - Significant improvements in instruction following, generating long texts (over 8K tokens), understanding structured data (e.g, tables), and generating structured outputs especially JSON. More resilient to the diversity of system prompts, enhancing role-play implementation and condition-setting for chatbots. - Long-context Support up to 128K tokens and can generate up to 8K tokens. - Multilingual support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.

Features

Serverless API

Docs

qwen/qwen2.5-7b-instruct is available via Novita's serverless API, where you pay per token. There are several ways to call the API, including OpenAI-compatible endpoints with exceptional reasoning performance.

On-demand Deployments

Docs

On-demand deployments allow you to use qwen/qwen2.5-7b-instruct on dedicated GPUs with high-performance serving stack with high reliability and no rate limits.

Available Serverless

Run queries immediately, pay only for usage

Input$0.056 / M Tokens$0.07 / M Tokens

Output$0.056 / M Tokens$0.07 / M Tokens

Use the following code examples to integrate with our API:

1from openai import OpenAI
2
3client = OpenAI(
4    api_key="<Your API Key>",
5    base_url="https://api.novita.ai/openai"
6)
7
8response = client.chat.completions.create(
9    model="qwen/qwen2.5-7b-instruct",
10    messages=[
11        {"role": "system", "content": "You are a helpful assistant."},
12        {"role": "user", "content": "Hello, how are you?"}
13    ],
14    max_tokens=32000,
15    temperature=0.7
16)
17
18print(response.choices[0].message.content)

Info

Provider

Qwen

Quantization

bf16

Supported Functionality

Context Length

32000

Max Output

32000

Serverless

Supported

Function Calling

Supported

Structured Output

Supported

Input Capabilities

text

Output Capabilities

text