Skip to main content
Dear novita AI users, This notice clarifies our LLM API billing policy, with a focus on how disconnected requests (HTTP 499) are handled.

Pricing

novita AI LLM API is billed per token:
  • Input Tokens — tokens consumed by the prompt
  • Output Tokens — tokens generated by the model
Total Cost = Input Tokens × Input Rate + Output Tokens × Output Rate Model-specific rates are available on the Pricing Page.

Billing Conditions

Charges apply when the model has begun inference. Requests that do not reach the model are not charged.
ScenarioChargedReason
Successful request (200)YesModel completed inference
Client disconnected (499)YesModel had begun inference
Invalid request / auth failure / rate limit (400/401/403/429)NoRequest rejected before reaching the model
Platform or upstream failure (500/503/504)NoInfrastructure error; absorbed by the platform
Full status code reference: Error Codes.

499 (Client Disconnected)

When a request reaches the model, inference starts immediately. If the client disconnects mid-request, the compute resources already consumed are still billable.
Request ModeBilling
Non-StreamingCharged for actual token usage regardless of disconnect timing
StreamingCharged for actual token usage regardless of disconnect timing
This policy is consistent with standard industry practice across major LLM providers.

Recommendations

  1. Set max_tokens to control output length at the request level
  2. Configure client timeout ≥ 60s to prevent unintended disconnections
  3. Select appropriate models based on task requirements to optimize cost

FAQ

Inference begins upon request arrival. Disconnecting does not stop computation already in progress.
Use the max_tokens parameter to cap output length before sending the request.
In the usage dashboard, listed alongside standard requests with actual token consumption.
For further questions, please contact our support team. The novita AI Team