LLM Serverless Model Deprecation Notice

Dear Developer, We are writing to inform you of upcoming changes to our Serverless Endpoints. As part of our regular product lifecycle management to ensure high performance and resource optimization, we are deprecating two batches of models: an immediate batch effective today (May 22, 2026) and a scheduled batch on June 5, 2026 (UTC). For the scheduled batch, you have 14 days from today to update your integration before these models are removed from the Serverless Endpoints.

Impacted Models & Recommended Alternatives

Please refer to the tables below to see if your application is using any of the impacted models and find their recommended alternatives.

Effective Immediately (May 22, 2026)

Deprecated model	Recommended alternatives	Notes
`baidu/ernie-4.5-21B-a3b`	`qwen/qwen3.5-35b-a3b`, `qwen/qwen3.6-35b-a3b`	—
`baidu/ernie-4.5-vl-28b-a3b`	`qwen/qwen3.5-35b-a3b`, `qwen/qwen3.6-35b-a3b`	—
`baidu/ernie-4.5-300b-a47b-paddle`	`deepseek/deepseek-v3.2`	—
`baidu/ernie-4.5-21B-a3b-thinking`	`qwen/qwen3.5-35b-a3b`, `qwen/qwen3.6-35b-a3b`	—
`qwen/qwen2.5-7b-instruct`	`qwen/qwen3.5-27b`, `qwen/qwen3.6-27b`	—
`qwen/qwen2.5-vl-72b-instruct`	`qwen/qwen3.5-27b`, `qwen/qwen3.6-27b`	—
`qwen/qwen2.5-32b-instruct`	`qwen/qwen3.5-27b`, `qwen/qwen3.6-27b`	—
`qwen/qwen3-4b`	`qwen/qwen3.5-27b`, `qwen/qwen3.6-27b`	—

Scheduled for June 5, 2026

Deprecated model	Recommended alternatives	Notes
`qwen/qwen3-30b-a3b`	`qwen/qwen3.5-35b-a3b`, `qwen/qwen3.6-35b-a3b`	Also available on Dedicated Endpoints
`qwen/qwen3-32b`	`qwen/qwen3.5-27b`, `qwen/qwen3.6-27b`	Also available on Dedicated Endpoints
`nousresearch/hermes-2-pro-llama-3-8b`	—	Available on Dedicated Endpoints only
`meta-llama/llama-3.2-1b-instruct`	—	Available on Dedicated Endpoints only
`sao10k/l3-70b-euryale-v2.1`	—	Available on Dedicated Endpoints only
`deepseek/deepseek-ocr`	`deepseek/deepseek-ocr-2`	—

What actions should you take?

To avoid service interruption, please choose one of the following options before the deadline:

Option 1: Switch to the Recommended Alternative

For deprecated models that have a Serverless alternative listed above, update your API code to point to the recommended alternative. These newer models offer improved reasoning capabilities, faster inference speeds, and better cost-efficiency.

Option 2: Continue Using Specific Models via Dedicated Endpoints

Several models in the June 5, 2026 batch remain available on Dedicated Endpoints:

Dedicated Endpoints only (no Serverless alternative): nousresearch/hermes-2-pro-llama-3-8b, meta-llama/llama-3.2-1b-instruct, sao10k/l3-70b-euryale-v2.1
Also available on Dedicated Endpoints (alongside Serverless alternatives): qwen/qwen3-30b-a3b, qwen/qwen3-32b

If your workflow requires any of these specific versions (e.g., for reproducibility or specific fine-tuning), you can deploy them on your own private GPU resources using our Dedicated Endpoints.

Benefit: This guarantees exclusive access, zero cold starts, and consistent performance.
Action: https://novita.ai/dedicated-endpoint

Timeline

Announcement Date: May 22, 2026
Immediate End of Life (EOL): May 22, 2026 (UTC) — effective upon announcement
Scheduled End of Life (EOL): June 5, 2026 (UTC)

Note: After each EOL date, API requests to the corresponding deprecated models via Serverless Endpoints will result in an error. If you have any questions or need assistance with the migration, please reach out to us via our Discord community or submit a support ticket. Best regards, The Novita AI Team

Last modified on May 22, 2026

May 25, 2026 April 17, 2026

⌘I

​Impacted Models & Recommended Alternatives

​Effective Immediately (May 22, 2026)

​Scheduled for June 5, 2026

​What actions should you take?

​Option 1: Switch to the Recommended Alternative

​Option 2: Continue Using Specific Models via Dedicated Endpoints

​Timeline