Model Library/XiaomiMiMo/MiMo-V2-Flash
X

XiaomiMiMo/MiMo-V2-Flash

xiaomimimo/mimo-v2-flash
Xiaomi MiMo-V2-Flash is a proprietary MoE model developed by Xiaomi, designed for extreme inference efficiency with 309B total parameters (15B active). By incorporating an innovative Hybrid attention architecture and multi-layer MTP inference acceleration, it ranks among the top 2 global open-source models across multiple Agent benchmarks. Its coding capabilities surpass all open-source models and rival the industry-leading closed-source model, Claude 4.5 Sonnet—yet at only 2.5% of the inference cost and with 2x the generation speed, successfully pushing the limits of both model performance and efficiency.

Características

API serverless

Documentación

xiaomimimo/mimo-v2-flash is available via Novita's serverless API, where you pay per token. There are several ways to call the API, including OpenAI-compatible endpoints with exceptional reasoning performance.

Serverless disponible

Ejecuta consultas de inmediato, paga solo por el uso

Entrada$0.11 / M Tokens
Lectura de caché$0.024 / M Tokens
Salida$0.33 / M Tokens

Usa los siguientes ejemplos de código para integrarte con nuestra API:

1from openai import OpenAI
2
3client = OpenAI(
4    api_key="<Your API Key>",
5    base_url="https://api.novita.ai/openai"
6)
7
8response = client.chat.completions.create(
9    model="xiaomimimo/mimo-v2-flash",
10    messages=[
11        {"role": "system", "content": "You are a helpful assistant."},
12        {"role": "user", "content": "Hello, how are you?"}
13    ],
14    max_tokens=32000,
15    temperature=0.7
16)
17
18print(response.choices[0].message.content)

Información

Proveedor
-
Cuantización
-

Funcionalidad compatible

Longitud del contexto
262144
Salida máxima
32000
Serverless
Compatible
Function Calling
Compatible
Structured Output
Compatible
Reasoning
Compatible
API de Anthropic
Compatible
Capacidades de entrada
text
Capacidades de salida
text

Todo lo que necesitas para crear IA de producción.

Más de 200 modelos, GPUs bajo demanda y entornos de ejecución seguros para agentes, unificados bajo una API. Gratis para empezar, escala a medida que creces.