LLM API Billing Rules Explained

Dear novita AI users, This notice clarifies our LLM API billing policy, with a focus on how disconnected requests (HTTP 499) are handled.

Pricing

novita AI LLM API is billed per token:

Input Tokens — tokens consumed by the prompt
Output Tokens — tokens generated by the model

Total Cost = Input Tokens × Input Rate + Output Tokens × Output Rate Model-specific rates are available on the Pricing Page.

Billing Conditions

Charges apply when the model has begun inference. Requests that do not reach the model are not charged.

Scenario	Charged	Reason
Successful request (200)	Yes	Model completed inference
Client disconnected (499)	Yes	Model had begun inference
Invalid request / auth failure / rate limit (400/401/403/429)	No	Request rejected before reaching the model
Error due to platform issues (500/503/504)	No	Infrastructure error; absorbed by the platform

Full status code reference: Error Codes.

499 (Client Disconnected)

When a request reaches the model, inference starts immediately. If the client disconnects mid-request, the compute resources already consumed are still billable.

Request Mode	Billing
Non-Streaming	Charged for actual token usage regardless of disconnect timing
Streaming	Charged for actual token usage regardless of disconnect timing

This policy is consistent with standard industry practice across major LLM providers.

Recommendations

Set max_tokens to control output length at the request level
Configure client timeout ≥ 60s to prevent unintended disconnections
Select appropriate models based on task requirements to optimize cost

FAQ

Why am I charged if I disconnected before receiving a response?

Inference begins upon request arrival. Disconnecting does not stop computation already in progress.

How can I avoid 499 charges?

Use the max_tokens parameter to cap output length before sending the request.

Where do 499 charges appear in my account?

In the usage dashboard, listed alongside standard requests with actual token consumption.

For further questions, please contact our support team. The novita AI Team

Last modified on March 31, 2026

April 17, 2026 GPU Pricing Update Notice

⌘I

​Pricing

​Billing Conditions

​499 (Client Disconnected)

​Recommendations

​FAQ

Pricing

Billing Conditions

499 (Client Disconnected)

Recommendations

FAQ