Skip to main content
This document covers advanced features and best practices for Novita Agent Runtime.

Table of Contents


Configuration File Reference

.novita-agent.yaml Structure

The .novita-agent.yaml configuration file uses Kubernetes-style YAML format:
apiVersion: v1
kind: Agent
metadata:
  name: my-agent              # Agent name (must consist of lowercase letters, numbers, and hyphens only)
  version: 1.0.0              # Agent version (semantic versioning)
  author: [email protected]     # Author email (required)
  description: My AI Agent    # Agent description (optional)
  created: '2025-10-23T10:30:00Z'  # Creation time (auto-generated)

spec:
  entrypoint: app.py          # Python entry file (must be .py file)
  
  # Environment variables configuration (optional)
  envVars:
    MODEL_NAME: deepseek/deepseek-v3-0324
    TEMPERATURE: '0.7'
  
  # Runtime configuration (optional, applied to the built sandbox template)
  runtime:
    timeout: 300              # Startup timeout in seconds (1-3600, default 300)
    memory_limit: 1Gi         # Memory limit (supports "512Mi", "1Gi", etc.)
    cpu_limit: '1'            # CPU limit (supports "1", "1000m", etc.)

# Status field (maintained by the system, should not be modified manually by users)
status:
  phase: deployed             # Current phase: pending/building/deployed/failed
  agent_id: agent-xxxxx       # Agent ID (auto-generated after deployment)
  last_deployed: '2025-10-23T10:35:00Z'  # Last deployment time
  build_id: build_xyz789      # Build ID (auto-generated after deployment)

Modifying Configuration

Modifying CPU and Memory Settings

Modify resource configuration under spec.runtime in .novita-agent.yaml:
spec:
  runtime:
    # CPU configuration
    cpu_limit: '2'        # 2 CPU cores
    # Memory configuration
    memory_limit: 2Gi     # 2 GB memory

Modifying Environment Variables

The spec.envVars in .novita-agent.yaml is only used for the CLI’s agent invoke command and will not be passed to the deployed sandbox template. Modify environment variables under spec.envVars in .novita-agent.yaml:
spec:
  envVars:
    # LLM configuration
    MODEL_NAME: deepseek/deepseek-v3-0324
    TEMPERATURE: '0.7'
Note:
  • ⚠️ Do not store sensitive information (such as API Keys) in .novita-agent.yaml
  • You can also pass environment variables via the --env parameter when running the agent invoke command

Redeploy to Apply Configuration Changes

After modifying resource specifications in .novita-agent.yaml, redeploy is required:
# Redeploy (creates a new version)
npx novita-sandbox-cli agent launch

Environment Variables Management

There are several ways to pass environment variables to Agents running in sandbox instances:

Method 1: Define in Configuration File (CLI invocation only)

Define environment variables under spec.envVars in .novita-agent.yaml:
spec:
  envVars:
    MODEL_NAME: deepseek/deepseek-v3-0324
    TEMPERATURE: '0.7'

Method 2: Pass Dynamically via SDK

When invoking an Agent using the SDK’s invoke_agent_runtime method, pass them dynamically via the envVars parameter:
import os
from novita_sandbox.agent_runtime import AgentRuntimeClient

client = AgentRuntimeClient(api_key=os.getenv("NOVITA_API_KEY"))

response = await client.invoke_agent_runtime(
    agentId="agent-xxxxx",
    payload=payload,
    envVars={
        # Read sensitive information from environment variables
        "NOVITA_API_KEY": os.getenv("NOVITA_API_KEY"),
        "DATABASE_PASSWORD": os.getenv("DATABASE_PASSWORD"),
        
        # Or pass directly
        "MODEL_NAME": "deepseek/deepseek-v3-0324",
        "TEMPERATURE": "0.7"
    }
)

Streaming Responses

Implementing Streaming with Synchronous Generators

Use Python generators to implement streaming responses:
from novita_sandbox.agent_runtime import AgentRuntimeApp

app = AgentRuntimeApp()

@app.entrypoint
def streaming_agent(request: dict):
    """Synchronous streaming response"""
    prompt = request.get("prompt", "")

    # Use generator to return chunks
    for i, chunk in enumerate(generate_response(prompt)):
        yield {
            "chunk": chunk,
            "type": "content",
            "index": i
        }
    # Send end marker
    yield {"chunk": "", "type": "end"}

Implementing Streaming with Async Generators

Use Python async generators:
import asyncio

@app.entrypoint
async def async_streaming_agent(request: dict):
    """Async streaming response"""
    prompt = request.get("prompt", "")

    async for chunk in async_generate_response(prompt):
        yield {
            "chunk": chunk,
            "type": "content"
        }
    yield {"chunk": "", "type": "end"}

LangChain Streaming Response Example

Complete example using LangChain for streaming responses:
import os
from langchain_openai import ChatOpenAI
from langchain.callbacks.base import BaseCallbackHandler
from novita_sandbox.agent_runtime import AgentRuntimeApp

app = AgentRuntimeApp()

class StreamingHandler(BaseCallbackHandler):
    """Streaming callback handler"""
    def __init__(self):
        self.tokens = []
    
    def on_llm_new_token(self, token: str, **kwargs):
        self.tokens.append(token)

@app.entrypoint
def langchain_streaming_agent(request: dict):
    """LangChain streaming response"""
    prompt = request.get("prompt", "")

    # Create streaming-enabled LLM
    llm = ChatOpenAI(
        api_key=os.getenv("NOVITA_API_KEY"),
        streaming=True
    )
    
    # Stream invocation
    for chunk in llm.stream(prompt):
        if chunk.content:
            yield {
                "chunk": chunk.content,
                "type": "content"
            }
    yield {"chunk": "", "type": "end"}

Invoking a Streaming Agent

Invoke a streaming Agent using the SDK:
import asyncio
import json
import os
from novita_sandbox.agent_runtime import AgentRuntimeClient

async def call_streaming_agent():
    client = AgentRuntimeClient(api_key=os.getenv("NOVITA_API_KEY"))
    
    payload = json.dumps({
        "prompt": "Tell me a story"
    }).encode()
    
    response = await client.invoke_agent_runtime(
        agentId="agent-xxxxx",
        payload=payload
    )
    
    # Process streaming response
    print("Streaming response:")
    print(response)

Version Management

Deploying a New Agent Version

Modify the version number and deploy a new version:
# Modify version number
npx novita-sandbox-cli agent configure --agent-version 1.1.0

# Deploy new version
npx novita-sandbox-cli agent launch
After successful deployment, a new agent_id is generated. Each deployment generates a unique agent_id that corresponds to a specific version.

Health Checks

Default Health Check Endpoint

AgentRuntimeApp automatically provides a /ping health check endpoint:
from novita_sandbox.agent_runtime import AgentRuntimeApp

app = AgentRuntimeApp()

# Default health check automatically responds with {"status": "Healthy"}

Custom Health Checks

Use the @app.ping decorator to customize health check logic:
@app.ping
def custom_health_check():
    """Custom health check"""
    # Check dependent services
    db_ok = check_database_connection()
    llm_ok = check_llm_service()
    
    if db_ok and llm_ok:
        return {"status": "Healthy"}
    elif db_ok or llm_ok:
        return {"status": "HealthyBusy"}  # Partially available
    else:
        return {"status": "Unhealthy"}  # Unavailable

def check_database_connection():
    """Check database connection"""
    try:
        # Simulate database check
        return True
    except:
        return False

def check_llm_service():
    """Check LLM service"""
    try:
        # Simulate LLM service check
        return True
    except:
        return False

Supported Health Check Statuses

Agents can return the following health statuses:
StatusDescriptionHTTP Status Code
HealthyAgent is fully available200
HealthyBusyAgent is partially available (e.g., processing heavy load)200
UnhealthyAgent is unavailable503

Multi-turn Conversations

Using Session ID for Multi-turn Conversations

Use the runtimeSessionId parameter to route multiple requests to the same sandbox instance:
import uuid
import json
import os
from novita_sandbox.agent_runtime import AgentRuntimeClient

async def multi_turn_conversation():
    runtime_session_id = str(uuid.uuid4())
    client = AgentRuntimeClient(api_key=os.getenv("NOVITA_API_KEY"))
    agent_id = "agent-xxxxx"
    
    # First turn
    response1 = await client.invoke_agent_runtime(
        agentId=agent_id,
        payload=json.dumps({"prompt": "Hello, my name is John"}).encode(),
        runtimeSessionId=runtime_session_id,
    )
    print(f"AI: {response1}")
    
    # Second turn (sent to the same sandbox instance, Agent remembers the context)
    response2 = await client.invoke_agent_runtime(
        agentId=agent_id,
        payload=json.dumps({"prompt": "What's my name?"}).encode(),
        runtimeSessionId=runtime_session_id,
    )
    print(f"AI: {response2}")  # Should answer "John"

Example Projects

We provide a complete example project based on LangGraph, demonstrating how to build real AI applications with Novita Agent Runtime.

Project Repository

🔗 https://github.com/novitalabs/Novita-CollabHub/tree/main/examples/agent-runtime/agentic-frameworks/langgraph