Skip to main content

Voucher Usage Guide

What types of voucher does Novita provide?

Novita currently offers the following types of vouchers to help users experience the platform’s services:
  • New User Voucher: Available to first-time registered users for trying out services of various types of models.
  • Referral Voucher: Earned by inviting others to register and complete tasks on the platform.

Rate Limits for Model Usage

What are the RPM limits for different users?

RPM (Requests Per Minute) limits vary based on a user’s verification level and account status. For detailed limits, please refer to the official documentation:
👉 For further assistance, please book a call with our sales team

RPM Adjustment and Upgrade Policy

Can users apply for increased RPM limits?

Yes. Novita allows flexible RPM upgrades based on usage needs. Rules are listed as below:
  • DeepSeek models: The platform will strive to accommodate reasonable scaling needs.
  • Other models: RPM increases are evaluated based on model cost and actual user behavior, subject to capacity availability.

Application process:

User → Customer Manager / Tech Support → Product Team Review & Approval

What happens if actual usage falls short of the committed RPM?

If the user’s actual RPM remains below the committed level for one consecutive week, the platform will adjust the rate limits policy as:
  • Reduce the limit to the peak RPM in the past week, or
  • Revert to the model’s default rate limit (whichever is lower).

Is self-service RPM upgrade supported?

Yes. Novita plans to launch an RPM Upgrade Package, allowing users to manage and increase RPM limits independently, without manual approval. For further assistance, please book a call with our sales team.

How to control thinking function of Zai-org/GLM-4.5 when calling its API?

When calling API zai-org/glm-4.5, there always exists some situations where thinking function is not needed. In these cases, if you want to turn thinking function off, you can simply add one fixed sentence called:
"enable_thinking": false 
at the bottom, for example:
{
  "model": "zai-org/glm-4.5",
  "messages": [
  {
   "role": "user",
   "content": "How is the weather in New York?"
  }
 ],
 "temperature": 0.7,
 "stream": false,
 "max_tokens": 500,
 "tool_choice": "auto",
 "enable_thinking": false
}

Billing

What is the billing priority when using the Coding Plan?

The system deducts from your Coding Plan quota first. Only after the Coding Plan quota is fully exhausted will charges be applied to your API account balance.

How are the 50M tokens in the Coding Plan calculated?

Tokens are calculated based on base-rate equivalent token counts. Different models carry varying pricing weights proportional to their cost ratios. Both input tokens and output tokens count toward your total usage.

How are cached tokens billed?

Cached read tokens are billed at a reduced rate, approximately 1/10th of the standard input token price (exact rates vary by model). Cached write operations are billed at the normal input token rate.

Thinking Mode

How do I disable thinking mode for reasoning models?

Behavior varies by model:
  • moonshotai/Kimi-K2-Instruct and zai-org/GLM-4.5: Add "enable_thinking": false to your request body.
  • MiniMax-M1: Thinking mode cannot currently be disabled for this model.
{
  "model": "moonshotai/Kimi-K2-Instruct",
  "messages": [
    {
      "role": "user",
      "content": "What is the capital of France?"
    }
  ],
  "enable_thinking": false
}

Feature Support

Is the top_logprobs parameter supported?

The top_logprobs parameter is currently not supported.

Is structured output (response_format) supported?

Currently, only "type": "json_object" is supported. "type": "json_schema" (with a strict schema definition) is not yet available.

Troubleshooting

Why am I receiving 403 errors when calling a model?

Common causes include:
  1. Insufficient account balance — top up your account and retry.
  2. Model requires whitelist access — certain models are restricted. See How do I apply for model whitelist access? below.
  3. Plan restriction — the model may not be included in your current subscription plan.

Why am I receiving 400 errors?

Common causes include:
  • Input length exceeds the model’s context window limit.
  • Incorrect parameter types or formatting.
  • Missing required fields in the request body.
Please provide your trace_id to our support team for detailed investigation.

What should I do if the API returns a “server overload” error?

This indicates temporary resource pressure on the platform. Retry your request after a brief delay (we recommend exponential backoff). If the issue persists, contact us for status confirmation.

Model Access

How do I apply for model whitelist access?

Contact our support team via Discord or the in-platform chat with your account information and the specific model name(s) you require. Our team will review and process your request.