Pricing
novita AI LLM API is billed per token:- Input Tokens — tokens consumed by the prompt
- Output Tokens — tokens generated by the model
Billing Conditions
Charges apply when the model has begun inference. Requests that do not reach the model are not charged.| Scenario | Charged | Reason |
|---|---|---|
| Successful request (200) | Yes | Model completed inference |
| Client disconnected (499) | Yes | Model had begun inference |
| Invalid request / auth failure / rate limit (400/401/403/429) | No | Request rejected before reaching the model |
| Platform or upstream failure (500/503/504) | No | Infrastructure error; absorbed by the platform |
499 (Client Disconnected)
When a request reaches the model, inference starts immediately. If the client disconnects mid-request, the compute resources already consumed are still billable.| Request Mode | Billing |
|---|---|
| Non-Streaming | Charged for actual token usage regardless of disconnect timing |
| Streaming | Charged for actual token usage regardless of disconnect timing |
Recommendations
- Set
max_tokensto control output length at the request level - Configure client timeout ≥ 60s to prevent unintended disconnections
- Select appropriate models based on task requirements to optimize cost
FAQ
Why am I charged if I disconnected before receiving a response?
Why am I charged if I disconnected before receiving a response?
Inference begins upon request arrival. Disconnecting does not stop computation already in progress.
How can I avoid 499 charges?
How can I avoid 499 charges?
Use the
max_tokens parameter to cap output length before sending the request.Where do 499 charges appear in my account?
Where do 499 charges appear in my account?
In the usage dashboard, listed alongside standard requests with actual token consumption.