Rate Limits of the LLM API
Rate Limits of the LLM API
Rate limits control how frequently users can make requests to our LLM API within specific time periods. Understanding and working within these limits is essential for optimal API usage.
1. Understanding Rate Limits
What are Rate Limits?
Rate limits restrict the number of API requests that can be made within defined time periods. They help:
- Prevent API abuse and misuse;
- Ensure fair resource distribution among users;
- Maintain consistent API performance and reliability;
- Protect the stability of our services.
Default Rate Limits
Each account has a default rate limit of 300 Requests Per Minute (RPM) Per Model. This means you can make up to 300 requests to each model in one minute.
2. Handling Rate Limits
How to Monitor Rate Limits?
When you exceed the rate limit, the API will return:
- HTTP Status Code: 429 (Too Many Requests);
- A rate limit exceeded message in the response body.
Best Practices
To avoid hitting rate limits:
- Implement request throttling in your application;
- Add exponential backoff for retries;
- Monitor your API usage patterns.
When You Hit Rate Limits
If you receive a 429 error, you can:
- Retry Later: Wait a short period before retrying your request;
- Optimize Requests: Reduce request frequency;
- Rate Limits Increase: For higher rate limits, you can:
- or
If you have any questions, please reach out to us on Discord.