Blog
Product news, partnerships and research from Novita AI.

Run Harbor Agent Evaluations on Novita Agent Sandbox
Harbor Novita Agent Sandbox support is visible on Harbor main. Learn the release boundary before using it in agent evaluations.

How to Use Novita AI with Goose: 200+ LLM Models
Configure Novita AI as a native provider in Goose. Access 200+ open-source models at $0.02/M tokens for agentic coding workflows.

How to Use Novita AI API in ForgeCode: Setup Guide
Set up Novita AI in ForgeCode to access Kimi-K2.5, GLM-5, and MiniMax-M2.5. Step-by-step guide with model comparison and pricing.

Novita AI × CLI-Anything: Agent-Native CLI for Every Model
Install cli-anything-novita and give your AI agent CLI access to every model on Novita AI — DeepSeek, GLM-5, MiniMax, and more.

How to Use Novita AI with OpenCode: The Ultimate Setup Guide
Unlock OpenCode's full potential with Novita AI. Step-by-step guide on how to connect DeepSeek V3.2, GLM 4.7 & more.

Optimizing GLM4-MoE for Production: 65% Faster TTFT with SGLang
As the state-of-the-art GLM 4.7 model continues to lead in coding performance, Novita AI remains committed to delivering a reliable, efficient, and production-grade GLM service to

Novita AI Partners with Poe to Expand AI Model Access
Novita AI models now available on Poe platform, expanding AI access for millions of users worldwide through integrated model deployment.

How to Use Codex with Novita AI Models: Complete Setup Guide
Step-by-step guide to setting up Codex CLI with Novita AI's powerful models. Configure DeepSeek, Qwen Coder, Kimi K2 for AI-powered coding.

Build a Multi-Agent System with Novita and CrewAI
Build intelligent Multi-Agent Systems with CrewAI and Novita's LLMs. Learn why multiple specialized AI agents outperform single-agent systems.

Trae + Novita AI: Step-by-Step Guide to Access AI Models in Your IDE
Integrate Novita AI with Trae IDE to access DeepSeek R1 and other cutting-edge models. Step-by-step setup guide with cost control.

Novita AI Partners with SGLang to Power Next‐Gen AI Inference
We're excited to announce a strategic partnership with SGLang, a fast serving engine for large language models and vision language models.

Using LlamaIndex with Novita AI: A Step-by-Step Guide
Discover how Using LlamaIndex with Novita AI simplifies data connection and boosts efficiency for LLM-powered applications.

Using DocsGPT with Novita AI: A Step-by-Step Guide
Learn how to enhance productivity by using Docsgpt with Novita AI for streamlined documentation and improved workflows.

Using Langfuse with Novita AI: A Comprehensive Guide
Novita AI has revolutionized LLM application development through its strategic integration with Langfuse. By combining Novita AI with Langfuse's monitoring capabilities, developers

Using ai-gradio with Novita AI: A Comprehensive Guide
Unlock the potential of ai-gradio Novita AI for interactive AI demos and APIs with our comprehensive guide.

Novita AI Is Now Available on Hugging Face
Explore Novita AI joining Hugging Face as a serverless Inference Provider, streamlining AI model deployment with ease.
Using Helicone with Novita AI: A Comprehensive Guide
Explore Helicone with Novita AI and discover how it enhances observability for developers using Large Language Models.

Using Continue with Novita AI: A Comprehensive Guide
Learn about the collaboration between Continue and Novita AI and how it enhances software development with AI-powered tools.

Using Langflow with Novita AI: A Comprehensive Guide
Learn how to integrate Novita AI's LLM APIs with Langflow to simplify and accelerate your AI application development workflow.

Announcing Our Partnership With vLLM to Advance AI Inference
Novita AI partners with vLLM to enhance AI inference with cutting-edge open-source technology, boosting performance and efficiency for developers deploying large language models.

Using gptel with Novita AI: A Comprehensive Guide
Discover how to integrate Novita AI's powerful LLM capabilities with gptel in Emacs, enhancing your development workflow with cutting-edge AI tools.

Using Portkey with Novita AI: A Comprehensive Guide
Learn how to seamlessly integrate Novita AI API with Portkey AI Gateway for enhanced performance, reliability, and scalability in your AI applications.

Using Gepetto with Novita AI: A Comprehensive Guide
Enhance your reverse engineering with Gepetto and Novita AI. Learn to integrate advanced language models into IDA Pro for improved function analysis and variable renaming.

Revolutionizing Large Language Model Inference: Speculative Decoding and Low-Precision Quantization
Learn how speculative sampling and low-precision quantization reduce costs and accelerate speed, offering practical solutions for scalable AI deployment.
Dynamic KV Cache compression based on vLLM framework
Novita AI speeds up Llama-70B loading with KV sparsity, reducing memory, computation, and I/O overhead for faster inference and minimal accuracy loss.

Using LangChain with Novita AI: A Comprehensive Guide
Learn how to leverage Novita AI's API key with LangChain to build powerful, context-aware AI applications. A comprehensive guide for developers.

Using MINDcraft with Novita AI: A Comprehensive Guide
MINDcraft is AI-driven creativity in Minecraft. Experience this open-source project that leverages large language models to control bots for complex tasks and autonomous gameplay,

Using AnythingLLM with Novita AI: A Comprehensive Guide
Boost productivity with AnythingLLM and Novita AI, combining advanced language models and PDF processing for secure, efficient AI workflows.

How to Select the Best GPU for LLM Inference: Benchmarking Insights
Discover how to select cost-effective GPUs for large model inference, focusing on performance metrics and best practices to enhance efficiency.

How KV Sparsity Achieves 1.5x Acceleration for vLLM
Boost AI inference speed with KV sparsity. Understand how it works and optimize your models for real-world applications.

Using Clipboard Conqueror with Novita AI: A Comprehensive Guide
Enhance text editing with Clipboard Conqueror by Novita AI. Use AI and ChatGPT seamlessly across text boxes to boost productivity with ease.

Dynamic allocation of GPU resources for Kubernetes workloads
Currently, to schedule GPU Pods in Kubernetes (k8s), various extension solutions are put into action, including Device Plugin, Extended Resource, scheduler extender, scheduler fram

Dynamically Adding Port Mappings to Running Docker Containers
Port mapping is a crucial aspect of developing and deploying containerized applications. Typically, we establish a connection between a container's internal port and a port on the

GPU Container Core Binding Strategy Based on Affinity
Introduction to Optimizing CPU and GPU Performance In high-performance computing and large-scale parallel task processing, GPUs have become indispensable accelerators. To fully uti

Will Speculative Decoding Harm LLM Inference Accuracy?
Mitchell Stern et al. 2018 introduced the prototype concept of speculative decoding. This method has since been further developed and refined by various approaches, including Looka

Using LobeChat with Novita AI: A Comprehensive Guide
Are you seeking an AI assistant that surpasses conversational chatbots in power and versatility? Look no further than LobeChat.

Using PyTorch with Novita AI: A Comprehensive Guide
A Dynamic Deep Learning Framework for AI Innovations What is PyTorch? PyTorch, the Python counterpart of Torch, is an open-source machine learning framework developed by Facebook.

Using CUDA with Novita AI: A Comprehensive Guide
Introduction With the rapid development of artificial intelligenc, GPUs have become a focal point in the arms race among major companies. Possessing more GPUs translates to greater

Quantization Methods for 100X Speedup in Large Language Model Inference
Discover how selecting the best data types and optimizing GPU hardware support unlocks new pathways for spending up quantization inference.