Documentation Index
Fetch the complete documentation index at: https://novita.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Last Updated: 2025-12-03
Status: Supported (OpenAI-Compatible)
1. Overview
Interleaved Thinking is an advanced reasoning framework that enables models to perform explicit reasoning steps between tool calls.
Models with Interleaved Thinking can:
- Reflect on the current environment and tool outputs
- Decide the next action based on updated reasoning
- Maintain a continuous reasoning chain across multiple tool invocations
- Provide transparent, inspectable multi-step thinking through
reasoning_details or reasoning_content
This capability transforms traditional function-calling into agent-level tool use, making complex workflows more accurate, reliable, and context-aware.
Novita fully supports Interleaved Thinking for all models that natively expose reasoning streams (e.g., MiniMax-M2 and other OpenAI-compatible reasoning models).
2. Key Concepts
2.1 Interleaving
Instead of executing a single reasoning phase followed by a tool call, the model performs:
Reason → Tool Call → Observe → Reason → Tool Call → ...
This allows the model to adjust its strategy dynamically based on previous tool outputs.
2.2 Reasoning Details (reasoning_details)
For some models, the content of the model’s thinking will be returned in the form of a separate structure:
"reasoning_details": [
{
"type": "reasoning.text",
"format": "openai-responses-v1",
"text": "Model’s step-by-step reasoning..."
}
]
For such models, Novita supports returning this field in both streaming and non-streaming modes.
2.3 Conversation Memory Requirement
To maintain reasoning continuity: You must append the model’s full response including reasoning_details, tool_calls, and content to subsequent messages.
Failing to preserve the chain may result in:
- Incorrect tool use
- Lost reasoning context
- Repeated or circular tool calls
- Reduced reliability
This requirement mirrors OpenAI’s reasoning APIs.
3. API Behavior
No changes are required on the user side.
Interleaved Thinking works with the standard OpenAI-compatible Chat Completions API.
The model may return the following fields:
reasoning_content: original thinking content
reasoning_details: structured reasoning segments, this field is optioanl
tool_calls: tool invocation plan
content: natural language output
These extend the standard OpenAI format.
4. Example Request (MiniMax-M2)
{
"model": "minimax/minimax-m2",
"messages": [
{
"role": "user",
"content": "How's the weather in San Francisco?"
},
{
"role": "assistant",
"name": "MiniMax AI",
"content": "",
"tool_calls": [
{
"id": "call_function_asqvfevfc8af_1",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"location\": \"San Francisco, US\"}"
}
}
],
"reasoning_details": [
{
"type": "reasoning.text",
"id": "reasoning-text-1",
"format": "openai-responses-v1",
"index": 0,
"text": "The user is asking about the weather in San Francisco..."
}
]
},
{
"role": "tool",
"tool_call_id": "call_function_asqvfevfc8af_1",
"content": "24℃, sunny"
}
],
"stream": true,
"reasoning_split": true,
"max_tokens": 1024,
"temperature": 0.7,
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a specific location",
"parameters": {
"type": "object",
"properties": {
"location": { "type": "string" }
},
"required": ["location"]
}
}
}
]
}
5. Example Response (Non-Streaming)
{
"id": "07a4dedfdb1498b045498dfd42497639",
"object": "chat.completion",
"created": 1764303147,
"model": "MiniMax",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "",
"name": "MiniMax",
"tool_calls": [
{
"index": 0,
"id": "call_function_9w7wq1j9zmpl_1",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"location\": \"ShangHai\"}"
}
}
],
"reasoning_content": "The user asked for Shanghai weather...",
"reasoning_details": [
{
"type": "reasoning.text",
"text": "The user is asking about the weather in Shanghai...",
"id": "reasoning-text-1",
"format": "openai-responses-v1",
"index": 0
}
]
},
"finish_reason": "tool_calls"
}
]
}
6. Streaming Response Example
{
"id": "664c5ad870c1888fbcbd267d9829e354",
"object": "chat.completion.chunk",
"created": 1764303504,
"model": "minimax-m2",
"choices": [
{
"index": 0,
"delta": {
"role": "assistant",
"reasoning_content": "...\n\nThe user has specifically asked...",
"reasoning_details": [
{
"type": "reasoning.text",
"text": ".\n\nThe user has specifically asked...",
"id": "reasoning-text-1",
"format": "openai-responses-v1",
"index": 0
}
]
},
"finish_reason": null
}
]
}
7. Developer Notes
7.1 Models That Support Interleaved Thinking
All models exposing reasoning_details through OpenAI-compatible APIs, including:
- MiniMax-M2
- (Upcoming) Novita Reasoning Series
- Other reasoning-enabled partner models
7.2 Pricing
Billing is based on reasoning tokens, following the model’s pricing rules.
reasoning_details will increase token usage.
7.3 Error Handling
You may encounter:
- Missing tool parameters
- Recursive or repeated tool calls
- Incorrect assumptions in the reasoning phase
Ensure your application validates tool arguments and handles model errors gracefully.
8. Best Practices
✓ Always include full model messages in the next request
Include:
content
tool_calls
reasoning_details
✓ Enable streaming for long-chain reasoning
Streaming allows the client to:
- Monitor the reasoning process
- Detect incorrect tool plans early
- Provide faster user feedback
✓ Combine with server-side state machines for stability
For production systems, we recommend pairing Interleaved Thinking with deterministic guardrails, e.g.:
- Parameter validators
- Execution sandbox
- Maximum recursion safeguards
9. Summary
Interleaved Thinking significantly enhances multi-step reasoning and tool-use reliability:
- Transparent and inspectable step-wise reasoning
- Adaptive planning between tool invocations
- Stronger context retention across long workflows
- Fully compatible with OpenAI-style Chat Completions API
Novita will continue expanding support for advanced reasoning models, bringing agent-level intelligence to the API layer.