Skip to main content

Pricing

novita AI LLM API is billed per token. Each request incurs charges for:
  • Input Tokens — tokens consumed by the prompt
  • Output Tokens — tokens generated by the model
Total Cost = Input Tokens Ɨ Input Rate + Output Tokens Ɨ Output Rate
Rates vary by model. Refer to the Pricing Page for details.

HTTP Status Codes

HTTP StatusNameDescriptionCharged
200SuccessRequest completed successfullyYes
400Bad RequestInvalid request parametersNo
401UnauthorizedInvalid or missing API keyNo
403ForbiddenInsufficient permissionsNo
429Rate LimitedTPM or RPM limit exceededNo
499Client DisconnectedClient closed the connection before completionYes
500Internal Server ErrorServer-side errorNo
503Service UnavailableService temporarily unavailableNo
504Gateway TimeoutUpstream response timed outNo
Billing principle:
  • Requests rejected before reaching the model (400/401/403/429): not charged
  • Platform or upstream failures (500/503/504): not charged
  • Requests where the model has begun inference (200/499): charged

499 — Client Disconnected

Once a request reaches the model, inference begins immediately on the server side. If the client disconnects before receiving the full response, compute resources have already been consumed. The request is charged based on actual token usage.
Request ModeBilling
Non-StreamingCharged for actual token usage regardless of disconnect timing
StreamingCharged for actual token usage regardless of disconnect timing
Recommendations
  • Use max_tokens to limit output length at the request level
  • Set client timeout to ≄ 60s to avoid unintended disconnections
  • Prefer controlling generation via max_tokens rather than closing the connection

FAQ

Inference begins as soon as the request reaches the model. Disconnecting does not halt computation already in progress.
Set the max_tokens parameter to control output length before sending the request.
In your usage dashboard, alongside standard requests, with actual token consumption recorded.