Model Library/Kimi K2 0905
MoonshotAI

Kimi K2 0905

moonshotai/kimi-k2-0905
Kimi K2 0905 is the September update of Kimi K2 0711. It is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It supports long-context inference up to 256k tokens, extended from the previous 128k. This update improves agentic coding with higher accuracy and better generalization across scaffolds, and enhances frontend coding with more aesthetic and functional outputs for web, 3D, and related tasks. Kimi K2 is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. It excels across coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) benchmarks. The model is trained with a novel stack incorporating the MuonClip optimizer for stable large-scale MoE training.

Features

Serverless API

Docs

moonshotai/kimi-k2-0905 is available via Novita's serverless API, where you pay per token. There are several ways to call the API, including OpenAI-compatible endpoints with exceptional reasoning performance.

On-demand Deployments

Docs

On-demand deployments allow you to use moonshotai/kimi-k2-0905 on dedicated GPUs with high-performance serving stack with high reliability and no rate limits.

Available Serverless

Run queries immediately, pay only for usage

Input$0.6 / M Tokens
Output$2.5 / M Tokens

Use the following code examples to integrate with our API:

1from openai import OpenAI
2
3client = OpenAI(
4    api_key="<Your API Key>",
5    base_url="https://api.novita.ai/openai"
6)
7
8response = client.chat.completions.create(
9    model="moonshotai/kimi-k2-0905",
10    messages=[
11        {"role": "system", "content": "You are a helpful assistant."},
12        {"role": "user", "content": "Hello, how are you?"}
13    ],
14    max_tokens=262144,
15    temperature=0.7
16)
17
18print(response.choices[0].message.content)

Info

Provider
MoonshotAI
Quantization
fp8

Supported Functionality

Context Length
262144
Max Output
262144
Structured Output
Supported
Function Calling
Supported
Anthropic API
Supported
Input Capabilities
text
Output Capabilities
text