Overview

Reasoning models are advanced language models optimized for complex problem-solving tasks. By generating detailed reasoning steps (chain-of-thought), they improve the accuracy of answers in analytical scenarios.

Typical Use Cases

  • Complex Problem Solving: Suitable for tasks requiring step-by-step logic, such as math or scientific reasoning.
  • Decision Support Systems: Helps explain the logic behind conclusions by providing detailed reasoning processes.
  • Education and Training: Assists learners in understanding complex concepts by presenting derivation processes clearly.

Installation & Setup

Before using reasoning models, make sure the latest OpenAI SDK is installed:

pip install -U openai

API Usage

Use the /chat/completions endpoint to invoke reasoning models.

Request Parameters

  • max_tokens: Sets the maximum number of tokens the model can return.
  • temperature: Recommended between 0.5 and 0.7 (suggested: 0.6) to balance creativity and logic.
  • top_p: Recommended value is 0.95.

Example Code

Streaming Response

from openai import OpenAI

client = OpenAI(api_key="YOUR_API_KEY", base_url="https://api.novita.ai/v3/openai")
messages = [
    {"role": "user", "content": "Explain Newton's Second Law."}
]

response = client.chat.completions.create(
    model="deepseek/deepseek-r1",
    messages=messages,
    stream=True,
    max_tokens=4096
)

content = ""
reasoning_content = ""
for chunk in response:
    if chunk.choices[0].delta.content:
        content += chunk.choices[0].delta.content
    if chunk.choices[0].delta.reasoning_content:
        reasoning_content += chunk.choices[0].delta.reasoning_content

print("Final Answer:", content)
print("Reasoning Steps:", reasoning_content)

Non-Streaming Response

response = client.chat.completions.create(
    model="deepseek/deepseek-r1",
    messages=[
        {"role": "user", "content": "What is the greenhouse effect? How can it be mitigated?"}
    ],
    stream=False,
    max_tokens=4096
)

content = response.choices[0].message.content
reasoning_content = response.choices[0].message.reasoning_content

print("Final Answer:", content)
print("Reasoning Steps:", reasoning_content)

Context Management

Reasoning outputs are not automatically carried over to the next round of dialogue. You must manually maintain the message history:

messages.append({"role": "assistant", "content": content})
messages.append({"role": "user", "content": "Please continue explaining the solution."})

Supported Models

DeepSeek Series

  • deepseek/deepseek-r1-0528
  • deepseek/deepseek-r1-0528-qwen3-8b
  • deepseek/deepseek-r1-turbo
  • deepseek/deepseek-r1-distill-qwen-32b
  • deepseek/deepseek-r1-distill-qwen-14b
  • deepseek/deepseek-r1-distill-llama-70b
  • deepseek/deepseek-r1-distill-llama-8b
  • deepseek/deepseek-r1/community

Qwen Series

  • qwen/qwen3-235b-a22b-fp8
  • qwen/qwen3-30b-a3b-fp8
  • qwen/qwen3-32b-fp8
  • qwen/qwen3-8b-fp8
  • qwen/qwen3-4b-fp8

GLM Series

  • thudm/glm-z1-rumination-32b-0414
  • thudm/glm-z1-9b-0414
  • thudm/glm-z1-32b-0414

LLaMA Series

  • meta-llama/llama-4-maverick-17b-128e-instruct-fp8

Minimaxai Series

  • minimaxai/minimax-m1-80k

Gryphe Series

  • gryphe/mythomax-l2-13b

Sao10K Series

  • Sao10K/L3-8B-Stheno-v3.2

Mistralai Series

  • mistralai/mistral-nemo
  • mistralai/mistral-7b-instruct

Other Series

  • microsoft/wizardlm-2-8x22b
  • nousresearch/hermes-2-pro-llama-3-8b
  • cognitivecomputations/dolphin-mixtral-8x22b
  • sophosympatheia/midnight-rose-70b

Visit the model library for the latest list and details.


Billing

  • Billing is based on the number of tokens for both input and output.
  • Please refer to each model’s pricing page for specific billing rules and token conversion details.

Notes & Best Practices

  • Avoid placing reasoning instructions in the system message. Instead, make the intent explicit in the user message.
  • For mathematical tasks, clearly instruct the model, e.g., “Please reason step by step and provide a final answer.”
  • To prevent the model from skipping reasoning steps, consider asking for a newline before the final answer.