🚀 Kimi K2 Thinking: A powerful reasoning model for coding and complex tasks is now live on Novita! Try Now!

Don't show again

Can't be used normally? Clickhereto let us know.

Kimi-Linear-48B-A3B-Instruct

Favorite

Copy Link

Official template

novitalabs/vllm:kimi-linear

Updated time: 04 Nov 2025

Deploy

README

Configuration

Run Kimi-Linear-48B-A3B-Instruct on Novita

What is Kimi-Linear

Kimi Linear is a hybrid linear attention architecture that outperforms traditional full attention methods across various contexts, including short, long, and reinforcement learning (RL) scaling regimes. At its core is Kimi Delta Attention (KDA)—a refined version of Gated DeltaNet that introduces a more efficient gating mechanism to optimize the use of finite-state RNN memory. Kimi Linear achieves superior performance and hardware efficiency, especially for long-context tasks. It reduces the need for large KV caches by up to 75% and boosts decoding throughput by up to $6\times$ for contexts as long as 1M tokens.

Key Features

Kimi Delta Attention (KDA): A linear attention mechanism that refines the gated delta rule with finegrained gating.
Hybrid Architecture: A 3:1 KDA-to-global MLA ratio reduces memory usage while maintaining or surpassing the quality of full attention.
Superior Performance: Outperforms full attention in a variety of tasks, including long-context and RL-style benchmarks on 1.4T token training runs with fair comparisons.
High Throughput: Achieves up to 6× faster decoding and significantly reduces time per output token (TPOT). You can check the details on the official website of the project.

Run Kimi-Linear-48B-A3B-Instruct on Novita

Step 1: Console Entry
Launch the GPU interface and select Get Started to access deployment management.
enter image description here Step 2: Package Selection
Locate Kimi-Linear-48B-A3B-Instruct in the template repository and begin installation sequence.
Step 3: Infrastructure Setup
Configure computing parameters including memory allocation, storage requirements, and network settings. Select Deploy to implement.
enter image description here Step 4: Review and Create
Double-check your configuration details and cost summary. When satisfied, click Deploy to start the creation process.
Step 5: Wait for Creation
After initiating deployment, the system will automatically redirect you to the instance management page. Your instance will be created in the background.
enter image description here Step 6: Monitor Download Progress
Track the image download progress in real-time. Your instance status will change from Pulling to Running once deployment is complete. You can view detailed progress by clicking the arrow icon next to your instance name. Step 7: Verify Instance Status
Click the Logs button to view instance logs and confirm that the InvokeAI service has started properly.
enter image description here Step 8: Environmental Access
Launch development space through Connect interface, then initialize Start Web Terminal.

Demo

To access your private model, copy the code below and replace "http://127.0.0.1:8080" with your actual endpoint address. enter image description here

1curl --request POST \
2  --url http://127.0.0.1:8080/v1/chat/completions \
3  --header "Authorization: Bearer " \
4  --header "Content-Type: application/json" \
5  --data '{
6      "model": "moonshotai/Kimi-Linear-48B-A3B-Instruct",
7      "messages": [
8        {"role": "user", "content":"who are you？"}
9      ],
10      "max_tokens": 128
11  }'
12 {"id":"chatcmpl-de7c4de865e94699b80eb1a0d0bc9f22","object":"chat.completion","created":1761904682,"model":"moonshotai/Kimi-Linear-48B-A3B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"I'm Kimi, a large language model trained by Moonshot AI. I'm here to help you with any questions or tasks you have. How can I assist you today?","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning_content":null},"logprobs":null,"finish_reason":"stop","stop_reason":163586,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":11,"total_tokens":46,"completion_tokens":35,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}