1. Benefits
With Prompt Cache enabled, you gain:- Lower Cost Repeated Prompts no longer require full inference. Only minimal cache token fees are charged.
- Lower Latency Cached results are returned instantly without running the model.
- Higher Throughput In high-QPS scenarios, Prompt Cache reduces compute load and improves overall system capacity.
- Transparent to Your Application no additional logic or system changes are required.
2. Supported Models
Several serverless open-source models currently support prompt cache billing, including: For pricing details regarding the prompt cache feature of supported models, please refer to: https://novita.ai/pricing (see “Cache” section).3. Use Cases
Prompt Cache is highly effective in workloads with frequent repeated Prompts, including but not limited to:- Template-based Generation
- Fixed-format summaries
- Template-driven rewriting
- Prompts reused across tasks
- Text Classification & Field Extraction
- Content type classification
- Tag or key information extraction
- Content Moderation
- Review of comments, ads, or titles
- Many moderation prompts repeat across users and time
- Repeated System Prompts in Chat Applications
- Chatbot persona definitions
- Global conversation rules
- Background information reused across multiple turns
- Workflow / Assistant-style Prompts
- SQL generation assistants
- Code repair assistants
- Summary assistants with fixed output formats
4. Response Examples
When the cache is hit, no inference is performed, resulting in significantly lower cost and faster responses.