Pricing
novita AI LLM API is billed per token. Each request incurs charges for:- Input Tokens ā tokens consumed by the prompt
- Output Tokens ā tokens generated by the model
Rates vary by model. Refer to the Pricing Page for details.
HTTP Status Codes
| HTTP Status | Name | Description | Charged |
|---|---|---|---|
| 200 | Success | Request completed successfully | Yes |
| 400 | Bad Request | Invalid request parameters | No |
| 401 | Unauthorized | Invalid or missing API key | No |
| 403 | Forbidden | Insufficient permissions | No |
| 429 | Rate Limited | TPM or RPM limit exceeded | No |
| 499 | Client Disconnected | Client closed the connection before completion | Yes |
| 500 | Internal Server Error | Server-side error | No |
| 503 | Service Unavailable | Service temporarily unavailable | No |
| 504 | Gateway Timeout | Upstream response timed out | No |
- Requests rejected before reaching the model (400/401/403/429): not charged
- Platform or upstream failures (500/503/504): not charged
- Requests where the model has begun inference (200/499): charged
499 ā Client Disconnected
Once a request reaches the model, inference begins immediately on the server side. If the client disconnects before receiving the full response, compute resources have already been consumed. The request is charged based on actual token usage.| Request Mode | Billing |
|---|---|
| Non-Streaming | Charged for actual token usage regardless of disconnect timing |
| Streaming | Charged for actual token usage regardless of disconnect timing |
FAQ
Why am I charged after disconnecting?
Why am I charged after disconnecting?
Inference begins as soon as the request reaches the model. Disconnecting does not halt computation already in progress.
How can I avoid 499 charges?
How can I avoid 499 charges?
Set the
max_tokens parameter to control output length before sending the request.Where do 499 charges appear?
Where do 499 charges appear?
In your usage dashboard, alongside standard requests, with actual token consumption recorded.