---
name: Novitaai
description: Use when building AI applications that need model inference (LLMs, image/video/audio generation), running isolated code sandboxes for agents, or managing GPU compute resources. Reach for this skill when integrating OpenAI-compatible APIs, handling asynchronous tasks, deploying custom models, or executing untrusted code safely.
metadata:
    mintlify-proj: novitaai
    version: "1.0"
---

# Novita AI Skill

## Product summary

Novita AI is a cloud inference platform providing cost-effective access to 10,000+ AI models. Use it to ship LLM APIs (OpenAI-compatible), image/video/audio generation, GPU compute instances, and isolated sandboxes for agent code execution. Key endpoints: `https://api.novita.ai/openai` (LLM), `https://api.novita.ai/v3` (image/video/audio). Authenticate with Bearer tokens in the `Authorization` header. Primary docs: https://novita.ai/docs

## When to use

- **LLM inference**: Chat completions, text generation, embeddings, function calling, structured outputs, reasoning models
- **Media generation**: Text-to-image, image-to-image, text-to-video, image-to-video, text-to-speech, voice cloning
- **Batch processing**: Cost-effective offline inference for large datasets (up to 50k requests per batch, 24-hour completion window)
- **Agent sandboxes**: Securely execute AI-generated code with <200ms startup, multi-language support (Python, JavaScript, C++)
- **GPU compute**: Dedicated instances for long-running workloads or serverless endpoints for burstable tasks
- **Custom models**: Deploy fine-tuned models or LoRA adapters on dedicated endpoints with per-hour billing

## Quick reference

### API Authentication
```bash
Authorization: Bearer {{API_KEY}}
```
Get API keys from https://novita.ai/settings/key-management

### LLM Endpoints
| Task | Endpoint | Format |
|------|----------|--------|
| Chat | `/v1/chat/completions` | OpenAI-compatible |
| Completion | `/v1/completions` | OpenAI-compatible |
| Embeddings | `/v1/embeddings` | OpenAI-compatible |
| Batch | `/v1/batches` | JSONL input files |

### Image/Video/Audio Endpoints
| Task | Base URL | Pattern |
|------|----------|---------|
| Text-to-Image | `https://api.novita.ai/v3` | Async (returns task_id) |
| Image-to-Image | `https://api.novita.ai/v3` | Async (returns task_id) |
| Text-to-Video | `https://api.novita.ai/v3` | Async (returns task_id) |
| Task Result | `https://api.novita.ai/v3/async/task-result` | Poll with task_id |

### Common Parameters
- `model`: Model identifier (e.g., `deepseek/deepseek-r1`, `flux-1-schnell`)
- `max_tokens`: Output length limit
- `temperature`: Randomness (0–2, default 0.7)
- `stream`: Boolean for streaming responses
- `task_id`: Returned by async APIs; use to poll results

### Sandbox CLI Commands
```bash
npm install -g novita-sandbox-cli
novita-sandbox-cli sandbox list          # List running sandboxes
novita-sandbox-cli sandbox kill <id>     # Terminate sandbox
```

### GPU Instance Pricing Models
- **On-Demand**: Pay per second, no commitment
- **Spot**: 50% discount, 1-hour grace period, 1-hour advance notice before reclaim
- **Monthly Subscription**: 10%+ discount vs on-demand

## Decision guidance

| Scenario | Use This | Why |
|----------|----------|-----|
| Real-time chat/completion | Streaming LLM API | Low latency, immediate responses |
| Large dataset processing | Batch API | Higher rate limits, cost-effective, 24h window acceptable |
| Image/video generation | Async task API + polling | Async pattern, poll task result with task_id |
| Webhook notifications | Webhook listener | Avoid polling; receive ASYNC_TASK_RESULT events |
| Long-running model | GPU Instance | Dedicated resources, predictable performance |
| Variable workload | Serverless GPU | Auto-scaling, per-second billing, no cold starts |
| Custom/fine-tuned model | Dedicated Endpoint | Exclusive access, zero cold starts, LoRA support |
| Untrusted code execution | Agent Sandbox | System isolation, <200ms startup, multi-language |

## Workflow

### 1. Set up authentication
- Create account at https://novita.ai
- Generate API key in Key Management
- Store securely; include in `Authorization: Bearer {{KEY}}` header

### 2. Choose your integration path
- **LLM**: Use OpenAI SDK with base URL `https://api.novita.ai/openai`
- **Image/Video/Audio**: Call async endpoints, receive task_id, poll task result
- **Batch**: Upload JSONL file, create batch, check status, retrieve results
- **Sandbox**: Use Python/JavaScript SDK or CLI
- **GPU**: Use API or console to create/manage instances

### 3. Make your first request
For LLM (Python):
```python
from openai import OpenAI
client = OpenAI(base_url="https://api.novita.ai/openai", api_key="<KEY>")
response = client.chat.completions.create(
    model="deepseek/deepseek-r1",
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)
```

For image generation (async pattern):
```bash
# 1. Submit task
curl -X POST https://api.novita.ai/v3/txt2img \
  -H "Authorization: Bearer <KEY>" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "cat", "model": "flux-1-schnell"}'
# Returns: {"task_id": "xxx"}

# 2. Poll result
curl https://api.novita.ai/v3/async/task-result?task_id=xxx \
  -H "Authorization: Bearer <KEY>"
# Returns: {"task": {"status": "TASK_STATUS_SUCCEED"}, "images": [...]}
```

### 4. Handle async tasks
- Image/video/audio APIs return `task_id` immediately
- Poll `/v3/async/task-result?task_id=<id>` until status is `TASK_STATUS_SUCCEED` or `TASK_STATUS_FAILED`
- Alternatively, set up webhook to receive `ASYNC_TASK_RESULT` events
- Results expire after 30 days; download promptly

### 5. Monitor and optimize
- Check rate limits in response headers
- Use batch API for non-urgent bulk processing
- Monitor account balance; set up auto-top-up
- Review billing at https://novita.ai/billing

## Common gotchas

- **Async task polling**: Image/video/audio APIs are async. Don't expect immediate results; poll task result endpoint or use webhooks.
- **Task result expiration**: Results are deleted 30 days after completion. Retrieve promptly.
- **Batch file format**: JSONL only, one request per line. All requests in a batch must target the same model.
- **Rate limits**: 429 errors mean you've hit TPM (tokens per minute) or RPM (requests per minute). Retry with backoff or contact support.
- **API key security**: Never commit API keys. Use environment variables or secret managers.
- **Streaming timeout**: Long outputs may timeout without streaming. Use `stream=true` for large responses.
- **Model availability**: Not all models support all features (function calling, structured outputs, reasoning). Check model details page.
- **Sandbox persistence**: Paused sandboxes retain state but consume storage. Resume or kill when done.
- **GPU spot instances**: Can be reclaimed with 1-hour notice. Use on-demand or monthly subscription for critical workloads.
- **Batch expiration**: Batches expire after 24 hours. Incomplete requests are canceled; you only pay for completed ones.

## Verification checklist

Before submitting work:
- [ ] API key is valid and has sufficient balance
- [ ] Correct base URL used (`https://api.novita.ai/openai` for LLM, `https://api.novita.ai/v3` for media)
- [ ] Authorization header includes `Bearer` prefix
- [ ] Model name matches available models (check `/v1/models` endpoint)
- [ ] For async tasks: polling logic handles all status values (QUEUED, PROCESSING, SUCCEED, FAILED)
- [ ] Batch JSONL is valid (one request per line, same model per batch)
- [ ] Rate limit handling includes exponential backoff
- [ ] Error responses checked for `BILLING_BALANCE_NOT_ENOUGH` or `RATE_LIMIT_EXCEEDED`
- [ ] Sandbox code tested locally before deployment
- [ ] GPU instance has sufficient disk/memory for workload
- [ ] Webhook endpoint (if used) is publicly accessible and returns 200 OK

## Resources

- **Full navigation**: https://novita.ai/docs/llms.txt
- **LLM API guide**: https://novita.ai/docs/guides/llm-api
- **Batch API guide**: https://novita.ai/docs/guides/llm-batch-api
- **Agent Sandbox overview**: https://novita.ai/docs/guides/sandbox-overview
- **GPU instances**: https://novita.ai/docs/guides/gpu-instance-overview
- **Error codes**: https://novita.ai/docs/api-reference/basic-error-code
- **Rate limits**: https://novita.ai/docs/guides/llm-rate-limits

---

> For additional documentation and navigation, see: https://novita.ai/docs/llms.txt