For agents: fetch the complete documentation index at llms.txt. Markdown is available with Accept: text/markdown and with .md URL variants.

The AI-Native Cloudfor Builders andAgents

Run models, scale GPUs, and build AI agents, all on one platform.

Start Building

Click to copy instructions for you agent:

Read https://novita.ai/docs/skill.md and follow the instructions.

Talk to Us

Trusted by

Harbor x Novita Hackathon

Run in Sandbox.Climb the board.

Benchmark your agent with Harbor + Novita Agent Sandbox.

View Details

1 / 2

MODEL APIS

LLM

IMAGE

AUDIO

VIDEO

VISION

MODEL

"KIMI-K2.5"

200+models

200mslatency

99.5%uptime

Serverless Model APIs

Run 200+ models through a single API.
No infrastructure to manage.

Text, image, audio, video — all serverless, all
production-ready. You call it, we run it. Billed by the
token, not the hour.

Explore All Models

Deepseek V4 Pro

$1.74/Mt Input · $3.48/Mt Output

1048576 Context

LLM

MiniMax M2.7

$0.3/Mt Input · $1.2/Mt Output

204800 Context

LLM

GLM-5.1

$1.4/Mt Input · $4.4/Mt Output

204800 Context

LLM

Kimi K2.6

$0.95/Mt Input · $4/Mt Output

262144 Context

LLM

Gemma 4 31B

$0.14/Mt Input · $0.4/Mt Output

262144 Context

LLM

Qwen3.5-397B-A17B

$0.6/Mt Input · $3.6/Mt Output

262144 Context

LLM

Deepseek V4 Pro

$1.74/Mt Input · $3.48/Mt Output

1048576 Context

LLM

MiniMax M2.7

$0.3/Mt Input · $1.2/Mt Output

204800 Context

LLM

GLM-5.1

$1.4/Mt Input · $4.4/Mt Output

204800 Context

LLM

Kimi K2.6

$0.95/Mt Input · $4/Mt Output

262144 Context

LLM

Gemma 4 31B

$0.14/Mt Input · $0.4/Mt Output

262144 Context

LLM

Qwen3.5-397B-A17B

$0.6/Mt Input · $3.6/Mt Output

262144 Context

LLM

Deepseek V4 Pro

$1.74/Mt Input · $3.48/Mt Output

1048576 Context

LLM

MiniMax M2.7

$0.3/Mt Input · $1.2/Mt Output

204800 Context

LLM

GLM-5.1

$1.4/Mt Input · $4.4/Mt Output

204800 Context

LLM

Kimi K2.6

$0.95/Mt Input · $4/Mt Output

262144 Context

LLM

Gemma 4 31B

$0.14/Mt Input · $0.4/Mt Output

262144 Context

LLM

Qwen3.5-397B-A17B

$0.6/Mt Input · $3.6/Mt Output

262144 Context

LLM

Dedicated Endpoints

Private endpoints. Guaranteed performance. No noisy neighbors.

Your model. Your compute. Isolated resources mean consistent latency at any throughput. Because production doesn't have a retry budget.

Get Started

AGENT SANDBOX

agent

"coding agents"

coding agent · active

sandbox runtime

Run test suite · pytest

queued

Write fix · patch applied

running

Identify bug · null pointer line 84

done

Read codebase · src/api/routes.py

done

startup~200ms

isolationFull

billingper second

statusRUNNING

Agent sandbox

Secure, isolated runtimes. Built for agents that actually do things.

Not a notebook. Not a container you configure yourself. A purpose-built environment where agents run, use tools, call models, and execute tasks — cleanly, in isolation, every time.

Get Started

GPU CLOUD

GPU

flagship

GPU Instances

Full-control GPU machines. Yours in seconds.

Deploy models, run inference, train from scratch, on dedicated GPU instances you fully control. Predictable performance. No shared resources. No surprises.

Serverless GPU

Submit a job. We handle the rest.

No instances to provision. No idle compute to pay for. Novita allocates GPU resources automatically, scales up under load, scales to zero when you're done. You pay for execution, nothing else.

job

queued

running

complete

allocating gpu resources

allocating

12%

allocated

auto

duration

0.1s

cost

$0.0001

idle time

$0.00

cluster

"Cluster-01"

CLUSTER-01 · 6 nodesNVLink · GPUDirect RDMA · PCIe

Node-01

51%

Node-02

79%

Node-03

86%

Node-05

89%

Node-06

65%

Node-07

81%

GPU 8× NVIDIA H200

GPU Memory 141 GB HBM3e per GPU

1.128TB total

Nodes 6 / 6

Interconnect NVLink 4th Gen · 900 GB/s

Network 400 Gb/s RDMA

Bare Metal

Maximum performance. Zero abstraction overhead.

Dedicated physical GPU clusters for large-scale inference, training runs, and enterprise deployments that can't compromise on throughput. When you need the hardware to yourself, this is it.

Why Novita AI

Built for AI from day one. Designed for what you're actually building.

Better price-performance

Up to 50% less than major cloud providers. Not because we cut corners, because we built the infrastructure.

Built for production reliability

Stable infrastructure with low latency, high throughput, and reliable uptime at scale.

One platform for the full AI stack

Model APIs, GPU infrastructure, and agent runtimes — all in one platform.

Scale with your workload

Start small and scale seamlessly from APIs to dedicated clusters.

Dedicated support when it matters

Fast technical support from a team that understands AI infrastructure.

Built with Novita AI

Testimonials

Don't take our word for it.

I appreciate how fast Novita AI moves to deploy newly released models. Their team is often the first to get stable, production ready inference support online – often on Day One. That speed is critical for the whole open-source AI community.

Julien Chaumond

Co-Founder & CTO

Novita has been a huge help for us at Fish Audio. Their reliable GPU infrastructure allows us focus on developing and improving our text-to-speech models instead of dealing with hardware headaches. Their support and performance have made it much easier to push our work forward.

Shijia Liao

Co-Founder & Chief Scientist

Novita's Model API was super simple to integrate, and it's been great in powering our AI-driven flashcards and quizzes. The platform takes care of the heavy lifting, so we can focus on building better learning tools for our users without worrying about infrastructure or scaling issues.

Petros Christodoulou

Co-Founder and CEO

Working with Novita AI has been a fantastic experience for Kilo. Their inference platform helps us deliver fast and reliable AI coding workflows across multiple LLMs, with strong real-world performance for agentic workflows. And the team has been remarkably easy to work with! They are always optimizing based on the latest models and technology—a perfect partner for Kilo Code.