Dear Developer,
We are writing to inform you of upcoming changes to our Serverless Endpoints.
As part of our regular product lifecycle management to ensure high performance and resource optimization, we are deprecating two batches of models: an immediate batch effective today (May 22, 2026) and a scheduled batch on June 5, 2026 (UTC).
For the scheduled batch, you have 14 days from today to update your integration before these models are removed from the Serverless Endpoints.
Impacted Models & Recommended Alternatives
Please refer to the tables below to see if your application is using any of the impacted models and find their recommended alternatives.
| Deprecated model | Recommended alternatives | Notes |
|---|
baidu/ernie-4.5-21B-a3b | qwen/qwen3.5-35b-a3b, qwen/qwen3.6-35b-a3b | ā |
baidu/ernie-4.5-vl-28b-a3b | qwen/qwen3.5-35b-a3b, qwen/qwen3.6-35b-a3b | ā |
baidu/ernie-4.5-300b-a47b-paddle | deepseek/deepseek-v3.2 | ā |
baidu/ernie-4.5-21B-a3b-thinking | qwen/qwen3.5-35b-a3b, qwen/qwen3.6-35b-a3b | ā |
qwen/qwen2.5-7b-instruct | qwen/qwen3.5-27b, qwen/qwen3.6-27b | ā |
qwen/qwen2.5-vl-72b-instruct | qwen/qwen3.5-27b, qwen/qwen3.6-27b | ā |
qwen/qwen2.5-32b-instruct | qwen/qwen3.5-27b, qwen/qwen3.6-27b | ā |
qwen/qwen3-4b | qwen/qwen3.5-27b, qwen/qwen3.6-27b | ā |
Scheduled for June 5, 2026
| Deprecated model | Recommended alternatives | Notes |
|---|
qwen/qwen3-30b-a3b | qwen/qwen3.5-35b-a3b, qwen/qwen3.6-35b-a3b | Also available on Dedicated Endpoints |
qwen/qwen3-32b | qwen/qwen3.5-27b, qwen/qwen3.6-27b | Also available on Dedicated Endpoints |
nousresearch/hermes-2-pro-llama-3-8b | ā | Available on Dedicated Endpoints only |
meta-llama/llama-3.2-1b-instruct | ā | Available on Dedicated Endpoints only |
sao10k/l3-70b-euryale-v2.1 | ā | Available on Dedicated Endpoints only |
deepseek/deepseek-ocr | deepseek/deepseek-ocr-2 | ā |
What actions should you take?
To avoid service interruption, please choose one of the following options before the deadline:
Option 1: Switch to the Recommended Alternative
For deprecated models that have a Serverless alternative listed above, update your API code to point to the recommended alternative.
These newer models offer improved reasoning capabilities, faster inference speeds, and better cost-efficiency.
Option 2: Continue Using Specific Models via Dedicated Endpoints
Several models in the June 5, 2026 batch remain available on Dedicated Endpoints:
- Dedicated Endpoints only (no Serverless alternative):
nousresearch/hermes-2-pro-llama-3-8b, meta-llama/llama-3.2-1b-instruct, sao10k/l3-70b-euryale-v2.1
- Also available on Dedicated Endpoints (alongside Serverless alternatives):
qwen/qwen3-30b-a3b, qwen/qwen3-32b
If your workflow requires any of these specific versions (e.g., for reproducibility or specific fine-tuning), you can deploy them on your own private GPU resources using our Dedicated Endpoints.
Timeline
- Announcement Date: May 22, 2026
- Immediate End of Life (EOL): May 22, 2026 (UTC) ā effective upon announcement
- Scheduled End of Life (EOL): June 5, 2026 (UTC)
Note: After each EOL date, API requests to the corresponding deprecated models via Serverless Endpoints will result in an error.
If you have any questions or need assistance with the migration, please reach out to us via our Discord community or submit a support ticket.
Best regards,
The Novita AI Team Last modified on June 10, 2026