LLM API Billing - Documentation

HTTP Status Codes

HTTP Status	Name	Description	Charged
200	Success	Request completed successfully	Yes
400	Bad Request	Invalid request parameters	No
401	Unauthorized	Invalid or missing API key	No
403	Forbidden	Insufficient permissions	No
429	Rate Limited	TPM or RPM limit exceeded	No
499	Client Disconnected	Client closed the connection before completion	Yes
500	Internal Server Error	Server-side error	No
503	Service Unavailable	Service temporarily unavailable	No
504	Gateway Timeout	Upstream response timed out	No

Billing principle:

Requests rejected before reaching the model (400/401/403/429): not charged

Error due to platform issues (500/503/504): not charged

Requests where the model has begun inference (200/499): charged

499 — Client Disconnected

Once a request reaches the model, inference begins immediately on the server side. If the client disconnects before receiving the full response, compute resources have already been consumed. The request is charged based on actual token usage.

Request Mode	Billing
Non-Streaming	Charged for actual token usage regardless of disconnect timing
Streaming	Charged for actual token usage regardless of disconnect timing

Recommendations

Use max_tokens to limit output length at the request level
Set client timeout to ≥ 60s to avoid unintended disconnections
Prefer controlling generation via max_tokens rather than closing the connection

FAQ

Why am I charged after disconnecting?

Inference begins as soon as the request reaches the model. Disconnecting does not halt computation already in progress.

How can I avoid 499 charges?

Set the max_tokens parameter to control output length before sending the request.

Where do 499 charges appear?

In your usage dashboard, alongside standard requests, with actual token consumption recorded.

​Pricing

​HTTP Status Codes

​499 — Client Disconnected

​FAQ

Pricing

HTTP Status Codes

499 — Client Disconnected

FAQ