Voucher Usage Guide
What types of voucher does Novita provide?
Novita currently offers the following types of vouchers to help users experience the platform’s services:- New User Voucher: Available to first-time registered users for trying out services of various types of models.
- Referral Voucher: Earned by inviting others to register and complete tasks on the platform.
Rate Limits for Model Usage
What are the RPM limits for different users?
RPM (Requests Per Minute) limits vary based on a user’s verification level and account status. For detailed limits, please refer to the official documentation:RPM Adjustment and Upgrade Policy
Can users apply for increased RPM limits?
Yes. Novita allows flexible RPM upgrades based on usage needs. Rules are listed as below:- DeepSeek models: The platform will strive to accommodate reasonable scaling needs.
- Other models: RPM increases are evaluated based on model cost and actual user behavior, subject to capacity availability.
Application process:
User → Customer Manager / Tech Support → Product Team Review & ApprovalWhat happens if actual usage falls short of the committed RPM?
If the user’s actual RPM remains below the committed level for one consecutive week, the platform will adjust the rate limits policy as:- Reduce the limit to the peak RPM in the past week, or
- Revert to the model’s default rate limit (whichever is lower).
Is self-service RPM upgrade supported?
Yes. Novita plans to launch an RPM Upgrade Package, allowing users to manage and increase RPM limits independently, without manual approval. For further assistance, please book a call with our sales team.How to control thinking function of Zai-org/GLM-4.5 when calling its API?
When calling API zai-org/glm-4.5, there always exists some situations where thinking function is not needed. In these cases, if you want to turn thinking function off, you can simply add one fixed sentence called:Billing
What is the billing priority when using the Coding Plan?
The system deducts from your Coding Plan quota first. Only after the Coding Plan quota is fully exhausted will charges be applied to your API account balance.How are the 50M tokens in the Coding Plan calculated?
Tokens are calculated based on base-rate equivalent token counts. Different models carry varying pricing weights proportional to their cost ratios. Both input tokens and output tokens count toward your total usage.How are cached tokens billed?
Cached read tokens are billed at a reduced rate, approximately 1/10th of the standard input token price (exact rates vary by model). Cached write operations are billed at the normal input token rate.Thinking Mode
How do I disable thinking mode for reasoning models?
Behavior varies by model:- moonshotai/Kimi-K2-Instruct and zai-org/GLM-4.5: Add
"enable_thinking": falseto your request body. - MiniMax-M1: Thinking mode cannot currently be disabled for this model.
Feature Support
Is the top_logprobs parameter supported?
The top_logprobs parameter is currently not supported.
Is structured output (response_format) supported?
Currently, only "type": "json_object" is supported. "type": "json_schema" (with a strict schema definition) is not yet available.
Troubleshooting
Why am I receiving 403 errors when calling a model?
Common causes include:- Insufficient account balance — top up your account and retry.
- Model requires whitelist access — certain models are restricted. See How do I apply for model whitelist access? below.
- Plan restriction — the model may not be included in your current subscription plan.
Why am I receiving 400 errors?
Common causes include:- Input length exceeds the model’s context window limit.
- Incorrect parameter types or formatting.
- Missing required fields in the request body.
trace_id to our support team for detailed investigation.