Blog

Product news, partnerships and research from Novita AI.

Run Harbor Agent Evaluations on Novita Agent Sandbox

Harbor Novita Agent Sandbox support is visible on Harbor main. Learn the release boundary before using it in agent evaluations.

May 19, 2026·6 min read

LLMPartnerships

How to Use Novita AI with Goose: 200+ LLM Models

Configure Novita AI as a native provider in Goose. Access 200+ open-source models at $0.02/M tokens for agentic coding workflows.

May 18, 2026·5 min read

Partnerships

How to Use Novita AI API in ForgeCode: Setup Guide

Set up Novita AI in ForgeCode to access Kimi-K2.5, GLM-5, and MiniMax-M2.5. Step-by-step guide with model comparison and pricing.

Mar 31, 2026·5 min read

Partnerships

Novita AI × CLI-Anything: Agent-Native CLI for Every Model

Install cli-anything-novita and give your AI agent CLI access to every model on Novita AI — DeepSeek, GLM-5, MiniMax, and more.

Mar 30, 2026·5 min read

Partnerships

How to Use Novita AI with OpenCode: The Ultimate Setup Guide

Unlock OpenCode's full potential with Novita AI. Step-by-step guide on how to connect DeepSeek V3.2, GLM 4.7 & more.

Jan 22, 2026·7 min read

Research

Optimizing GLM4-MoE for Production: 65% Faster TTFT with SGLang

As the state-of-the-art GLM 4.7 model continues to lead in coding performance, Novita AI remains committed to delivering a reliable, efficient, and production-grade GLM service to

Jan 21, 2026·5 min read

Partnerships

Novita AI Partners with Poe to Expand AI Model Access

Novita AI models now available on Poe platform, expanding AI access for millions of users worldwide through integrated model deployment.

Sep 19, 2025·3 min read

Partnerships

How to Use Codex with Novita AI Models: Complete Setup Guide

Step-by-step guide to setting up Codex CLI with Novita AI's powerful models. Configure DeepSeek, Qwen Coder, Kimi K2 for AI-powered coding.

Sep 15, 2025·4 min read

Partnerships

Build a Multi-Agent System with Novita and CrewAI

Build intelligent Multi-Agent Systems with CrewAI and Novita's LLMs. Learn why multiple specialized AI agents outperform single-agent systems.

Aug 20, 2025·11 min read

Partnerships

Trae + Novita AI: Step-by-Step Guide to Access AI Models in Your IDE

Integrate Novita AI with Trae IDE to access DeepSeek R1 and other cutting-edge models. Step-by-step setup guide with cost control.

Jul 20, 2025·6 min read

Partnerships

Novita AI Partners with SGLang to Power Next‐Gen AI Inference

We're excited to announce a strategic partnership with SGLang, a fast serving engine for large language models and vision language models.

May 23, 2025·2 min read

Partnerships

Using LlamaIndex with Novita AI: A Step-by-Step Guide

Discover how Using LlamaIndex with Novita AI simplifies data connection and boosts efficiency for LLM-powered applications.

Mar 24, 2025·6 min read

Partnerships

Using DocsGPT with Novita AI: A Step-by-Step Guide

Learn how to enhance productivity by using Docsgpt with Novita AI for streamlined documentation and improved workflows.

Mar 18, 2025·5 min read

Partnerships

Using Langfuse with Novita AI: A Comprehensive Guide

Novita AI has revolutionized LLM application development through its strategic integration with Langfuse. By combining Novita AI with Langfuse's monitoring capabilities, developers

Mar 7, 2025·4 min read

Partnerships

Using ai-gradio with Novita AI: A Comprehensive Guide

Unlock the potential of ai-gradio Novita AI for interactive AI demos and APIs with our comprehensive guide.

Mar 4, 2025·2 min read

Partnerships

Novita AI Is Now Available on Hugging Face

Explore Novita AI joining Hugging Face as a serverless Inference Provider, streamlining AI model deployment with ease.

Feb 19, 2025·5 min read

Partnerships

Using Helicone with Novita AI: A Comprehensive Guide

Explore Helicone with Novita AI and discover how it enhances observability for developers using Large Language Models.

Feb 18, 2025·5 min read

Partnerships

Using Continue with Novita AI: A Comprehensive Guide

Learn about the collaboration between Continue and Novita AI and how it enhances software development with AI-powered tools.

Jan 22, 2025·6 min read

Partnerships

Using Langflow with Novita AI: A Comprehensive Guide

Learn how to integrate Novita AI's LLM APIs with Langflow to simplify and accelerate your AI application development workflow.

Jan 22, 2025·4 min read

Partnerships

Announcing Our Partnership With vLLM to Advance AI Inference

Novita AI partners with vLLM to enhance AI inference with cutting-edge open-source technology, boosting performance and efficiency for developers deploying large language models.

Jan 21, 2025·2 min read

Partnerships

Using gptel with Novita AI: A Comprehensive Guide

Discover how to integrate Novita AI's powerful LLM capabilities with gptel in Emacs, enhancing your development workflow with cutting-edge AI tools.

Jan 1, 2025·5 min read

Partnerships

Using Portkey with Novita AI: A Comprehensive Guide

Learn how to seamlessly integrate Novita AI API with Portkey AI Gateway for enhanced performance, reliability, and scalability in your AI applications.

Dec 26, 2024·4 min read

Partnerships

Using Gepetto with Novita AI: A Comprehensive Guide

Enhance your reverse engineering with Gepetto and Novita AI. Learn to integrate advanced language models into IDA Pro for improved function analysis and variable renaming.

Dec 25, 2024·5 min read

Research

Revolutionizing Large Language Model Inference: Speculative Decoding and Low-Precision Quantization

Learn how speculative sampling and low-precision quantization reduce costs and accelerate speed, offering practical solutions for scalable AI deployment.

Dec 18, 2024·9 min read

Novita AI

Research

Dynamic KV Cache compression based on vLLM framework

Novita AI speeds up Llama-70B loading with KV sparsity, reducing memory, computation, and I/O overhead for faster inference and minimal accuracy loss.

Dec 13, 2024·3 min read

Partnerships

Using LangChain with Novita AI: A Comprehensive Guide

Learn how to leverage Novita AI's API key with LangChain to build powerful, context-aware AI applications. A comprehensive guide for developers.

Dec 11, 2024·5 min read

Partnerships

Using MINDcraft with Novita AI: A Comprehensive Guide

MINDcraft is AI-driven creativity in Minecraft. Experience this open-source project that leverages large language models to control bots for complex tasks and autonomous gameplay,

Dec 4, 2024·6 min read

Partnerships

Using AnythingLLM with Novita AI: A Comprehensive Guide

Boost productivity with AnythingLLM and Novita AI, combining advanced language models and PDF processing for secure, efficient AI workflows.

Nov 8, 2024·6 min read

Research

How to Select the Best GPU for LLM Inference: Benchmarking Insights

Discover how to select cost-effective GPUs for large model inference, focusing on performance metrics and best practices to enhance efficiency.

Nov 5, 2024·14 min read

Research

How KV Sparsity Achieves 1.5x Acceleration for vLLM

Boost AI inference speed with KV sparsity. Understand how it works and optimize your models for real-world applications.

Oct 25, 2024·13 min read

Partnerships

Using Clipboard Conqueror with Novita AI: A Comprehensive Guide

Enhance text editing with Clipboard Conqueror by Novita AI. Use AI and ChatGPT seamlessly across text boxes to boost productivity with ease.

Oct 25, 2024·8 min read

Research

Dynamic allocation of GPU resources for Kubernetes workloads

Currently, to schedule GPU Pods in Kubernetes (k8s), various extension solutions are put into action, including Device Plugin, Extended Resource, scheduler extender, scheduler fram

Oct 24, 2024·4 min read

Research

Dynamically Adding Port Mappings to Running Docker Containers

Port mapping is a crucial aspect of developing and deploying containerized applications. Typically, we establish a connection between a container's internal port and a port on the

Oct 21, 2024·4 min read

Research

GPU Container Core Binding Strategy Based on Affinity

Introduction to Optimizing CPU and GPU Performance In high-performance computing and large-scale parallel task processing, GPUs have become indispensable accelerators. To fully uti

Aug 26, 2024·4 min read

Research

Will Speculative Decoding Harm LLM Inference Accuracy?

Mitchell Stern et al. 2018 introduced the prototype concept of speculative decoding. This method has since been further developed and refined by various approaches, including Looka

Aug 26, 2024·3 min read

Partnerships

Using LobeChat with Novita AI: A Comprehensive Guide

Are you seeking an AI assistant that surpasses conversational chatbots in power and versatility? Look no further than LobeChat.

Aug 7, 2024·6 min read

Partnerships

Using PyTorch with Novita AI: A Comprehensive Guide

A Dynamic Deep Learning Framework for AI Innovations What is PyTorch? PyTorch, the Python counterpart of Torch, is an open-source machine learning framework developed by Facebook.

Jul 31, 2024·4 min read

Partnerships

Using CUDA with Novita AI: A Comprehensive Guide

Introduction With the rapid development of artificial intelligenc, GPUs have become a focal point in the arms race among major companies. Possessing more GPUs translates to greater

Jul 31, 2024·5 min read

Research

Quantization Methods for 100X Speedup in Large Language Model Inference

Discover how selecting the best data types and optimizing GPU hardware support unlocks new pathways for spending up quantization inference.

Feb 2, 2024·16 min read

Run Harbor Agent Evaluations on Novita Agent Sandbox

How to Use Novita AI with Goose: 200+ LLM Models

How to Use Novita AI API in ForgeCode: Setup Guide

Novita AI × CLI-Anything: Agent-Native CLI for Every Model

How to Use Novita AI with OpenCode: The Ultimate Setup Guide

Optimizing GLM4-MoE for Production: 65% Faster TTFT with SGLang

Novita AI Partners with Poe to Expand AI Model Access

How to Use Codex with Novita AI Models: Complete Setup Guide

Build a Multi-Agent System with Novita and CrewAI

Trae + Novita AI: Step-by-Step Guide to Access AI Models in Your IDE

Novita AI Partners with SGLang to Power Next‐Gen AI Inference

Using LlamaIndex with Novita AI: A Step-by-Step Guide

Using DocsGPT with Novita AI: A Step-by-Step Guide

Using Langfuse with Novita AI: A Comprehensive Guide

Using ai-gradio with Novita AI: A Comprehensive Guide

Novita AI Is Now Available on Hugging Face

Using Helicone with Novita AI: A Comprehensive Guide

Using Continue with Novita AI: A Comprehensive Guide

Using Langflow with Novita AI: A Comprehensive Guide

Announcing Our Partnership With vLLM to Advance AI Inference

Using gptel with Novita AI: A Comprehensive Guide

Using Portkey with Novita AI: A Comprehensive Guide

Using Gepetto with Novita AI: A Comprehensive Guide

Revolutionizing Large Language Model Inference: Speculative Decoding and Low-Precision Quantization

Dynamic KV Cache compression based on vLLM framework

Using LangChain with Novita AI: A Comprehensive Guide

Using MINDcraft with Novita AI: A Comprehensive Guide

Using AnythingLLM with Novita AI: A Comprehensive Guide

How to Select the Best GPU for LLM Inference: Benchmarking Insights

How KV Sparsity Achieves 1.5x Acceleration for vLLM

Using Clipboard Conqueror with Novita AI: A Comprehensive Guide

Dynamic allocation of GPU resources for Kubernetes workloads

Dynamically Adding Port Mappings to Running Docker Containers

GPU Container Core Binding Strategy Based on Affinity

Will Speculative Decoding Harm LLM Inference Accuracy?

Using LobeChat with Novita AI: A Comprehensive Guide

Using PyTorch with Novita AI: A Comprehensive Guide

Using CUDA with Novita AI: A Comprehensive Guide

Quantization Methods for 100X Speedup in Large Language Model Inference

Everything you need to build production AI.