May 25, 2026 - Documentation

Dedicated Endpoint

On-Demand Deployment. Introduced a “Deploy on Demand” button on the Model Library, enabling one-click conversion from Shared API to Dedicated Endpoint with auto-filled configurations.
RTX 5090 Support. Added support for RTX 5090 (32GB, Blackwell architecture), providing a cost-effective GPU option for small-to-medium models (≤32B parameters).
LoRA Management. Introduced LoRA adapter management for Dedicated Endpoints. You can now add, remove, and update LoRA adapters within your endpoint configuration, with alias support for easier organization and referencing across multiple LoRA setups.

Deployment Reliability. Improved deployment success rate to ≥95%, ensuring more consistent and stable endpoint provisioning.
Transparent Context Length. Model maximum context length is now clearly displayed, with GPU recommendations based on full context capacity rather than minimum requirements.