Skip to main content

Documentation Index

Fetch the complete documentation index at: https://novita.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

If you don’t have a Novita account, sign up first. For details, see the Quickstart guide. This article uses the ComfyUI worker image novitalabs/comfyui-worker:v0.0.1 as an example to show how to create and call an Async Serverless Endpoint.

1. Prepare Container Image

Package your runtime environment into a Docker image and upload it to an image registry in advance. Both public and private image registries are supported. Private registries require image pull credentials.
  • You can upload your image to Docker Hub. The platform currently provides an image warm-up service for Docker Hub images.
This example uses novitalabs/comfyui-worker:v0.0.1. The image includes ComfyUI and the Novita worker SDK. The task input is a ComfyUI workflow JSON, and the worker handler returns generated image results. We recommend configuring object storage environment variables such as BUCKET_ENDPOINT_URL, so generated images and videos can be uploaded to your bucket and returned as URLs in the job output.

2. Select Instance Specification

Async Serverless Endpoint currently supports the following GPU instance types:
  • RTX 4090 24GB
  • H100 SXM 80GB
For this comfyui-worker example, we recommend RTX 4090 24GB. For additional requirements, contact us.

3. Create Cloud Storage (Optional)

If you need shared or persistent storage, create cloud storage on the storage management page, then mount the storage when creating the endpoint. For details, see Manage Cloud Storage.

4. Create Endpoint

  1. Go to the Async Serverless GPUs page, select an instance type, and click “Create Endpoint”.
  2. Complete the Endpoint parameter configuration.
  • Endpoint Name: Used to uniquely identify the Endpoint. It is part of the URL when creating jobs. The system generates a random default name. You can customize it, but using the default name is recommended.
  • Worker Configuration
Configuration ItemDescription
Min Worker CountThe minimum number of worker instances to keep for the Endpoint. Setting a higher minimum helps reduce cold start time. If set to 0, there will be no idle workers when there are no requests, which may increase response latency for new requests. Use 0 with caution for latency-sensitive scenarios.
Max Worker CountThe maximum number of worker instances that the Endpoint can scale up to. When request volume increases, the platform automatically increases workers up to this maximum. This limit helps control costs.
Idle Timeout (seconds)When a worker is about to be released due to scale-down, the platform keeps it for the configured idle timeout so it can respond quickly to new requests. You are charged for the worker during this period.
Max Concurrent RequestsThe maximum number of concurrent requests handled by one worker. If this is exceeded, requests are routed to other workers. If all workers are fully occupied, excess requests are queued until execution is possible.
GPUs / WorkerNumber of GPU cards allocated to each worker.
CUDA VersionCUDA version used by the worker.
For this example, select RTX 4090 24GB and set GPUs / Worker to 1.
  • Type:
    • Select Async.
  • Elastic Policy:
    • Select Queue request policy.
    • Set Single worker target concurrency to 1. The ComfyUI worker in this example processes one job at a time. When queued requests exceed current worker capacity, the platform scales workers based on the queue request count until reaching the maximum worker count.
  • Image Configuration:
    • Image address: novitalabs/comfyui-worker:v0.0.1.
    • Image repository credentials: If the image is private, provide image pull credentials. You can create credentials on the security credentials management page.
    • HTTP Port: Worker HTTP port.
    • Container start command: Command executed when the container starts.
  • Storage Configuration:
    • System disk: System disk size per worker instance.
    • Cloud storage: Select cloud storage if you need to mount it. For details, see Manage Cloud Storage.
  • Other:
    • Health check path: This parameter is currently not enabled.
    • Environment variables: Set environment variables required by the service. Example S3 configuration:
BUCKET_ENDPOINT_URL=https://s3.<aws-region>.amazonaws.com
BUCKET_ACCESS_KEY_ID=<your-access-key-id>
BUCKET_SECRET_ACCESS_KEY=<your-secret-access-key>
BUCKET_NAME=<your-bucket-name>
When using comfyui-worker, we strongly recommend configuring object storage so output images are uploaded to a bucket and returned as URLs.
  1. Review pricing and click “Deploy with One Click”.

5. Access the Service

  1. On the Async Serverless GPUs page, find the newly created Endpoint and ensure its status is “Running”.
  2. Ensure that at least one Worker in the Endpoint is running.
  3. Ensure you have an API Key for authentication. The Endpoint creator and the API Key owner must belong to the same team.
You need the following information to call an Async Serverless Endpoint:
ParameterDescription
Public base URLhttps://async-public.serverless.novita.ai/v1
Endpoint NameThe name generated after creating the Endpoint, for example 0f43a6867e05fddd. This name is part of the job URL.
API KeyCreate or copy an API Key from the API Key / Key Management page. Pass it in the Authorization: Bearer <API_KEY> request header.
Get an API Key:
  1. Log in to the Novita console.
  2. Go to the API Key / Key Management page.
  3. Create an API Key and copy the generated sk_... value.
  4. Ensure the API Key owner and Endpoint owner are in the same team.

5.1 Create a Job and Retrieve Output via Curl

The following request is an executable comfyui-worker example and matches the tested case. Replace 0f43a6867e05fddd in the URL with your real Endpoint name, and replace sk_xxxx with your real API Key.
The maximum job size accepted by Async Serverless Endpoint is 4 MiB.
curl -X POST https://async-public.serverless.novita.ai/v1/0f43a6867e05fddd/run \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer sk_xxxx' \
  -d '{
    "input": {
      "workflow": {
        "4": {
          "class_type": "CheckpointLoaderSimple",
          "inputs": {
            "ckpt_name": "flux1-dev-fp8.safetensors"
          }
        },
        "5": {
          "class_type": "EmptyLatentImage",
          "inputs": {
            "width": 512,
            "height": 512,
            "batch_size": 1
          }
        },
        "6": {
          "class_type": "CLIPTextEncode",
          "inputs": {
            "clip": ["4", 1],
            "text": "a red apple on a table"
          }
        },
        "7": {
          "class_type": "CLIPTextEncode",
          "inputs": {
            "clip": ["4", 1],
            "text": "blurry, low quality"
          }
        },
        "3": {
          "class_type": "KSampler",
          "inputs": {
            "model": ["4", 0],
            "positive": ["6", 0],
            "negative": ["7", 0],
            "latent_image": ["5", 0],
            "seed": 42,
            "steps": 10,
            "cfg": 7,
            "sampler_name": "euler",
            "scheduler": "normal",
            "denoise": 1
          }
        },
        "8": {
          "class_type": "VAEDecode",
          "inputs": {
            "samples": ["3", 0],
            "vae": ["4", 2]
          }
        },
        "9": {
          "class_type": "SaveImage",
          "inputs": {
            "filename_prefix": "test",
            "images": ["8", 0]
          }
        }
      },
      "output_node_id": "9"
    }
}'
Response example, where id is the job_id:
{"id":"8cb6a77c-62aa-4eb4-9226-1ca5724fd9dd","status":"PENDING"}
Check job status and retrieve results
The maximum output size returned by the Async Serverless Endpoint status API is 4 MiB. To avoid this limitation, configure object storage environment variables and return uploaded file URLs in the output.Job results are kept in the Async Serverless Endpoint for up to 6 hours after completion.
curl -X GET https://async-public.serverless.novita.ai/v1/0f43a6867e05fddd/status/33a0bc4b-7312-41f6-ad15-eb9016bd68f9 \
  -H 'Authorization: Bearer sk_xxxx'
Cancel Job
curl -X POST https://async-public.serverless.novita.ai/v1/0f43a6867e05fddd/cancel/e5f3c3c0-c3b1-49c2-9452-bb96eaa34ce6 \
  -H 'Authorization: Bearer sk_xxxx'
Check Endpoint Job Queue Status
curl -X GET https://async-public.serverless.novita.ai/v1/0f43a6867e05fddd/health \
  -H 'Authorization: Bearer sk_xxxx'
Response example:
{
  "workers": {
    "idle": 0,
    "running": 0,
    "throttled": 0,
    "total": 0
  },
  "jobs": {
    "completed": 0,
    "failed": 0,
    "inProgress": 0,
    "inQueue": 0,
    "retried": 0
  }
}

5.2 Create Job and Get Results via Novita SDK

Install the SDK:
pip install novita-gpus
import novita_gpus

novita_gpus.api_key = "sk_xxxx"

input_payload = {
    "workflow": {
        "4": {
            "class_type": "CheckpointLoaderSimple",
            "inputs": {"ckpt_name": "flux1-dev-fp8.safetensors"},
        },
        "5": {
            "class_type": "EmptyLatentImage",
            "inputs": {"width": 512, "height": 512, "batch_size": 1},
        },
        "6": {
            "class_type": "CLIPTextEncode",
            "inputs": {"clip": ["4", 1], "text": "a red apple on a table"},
        },
        "7": {
            "class_type": "CLIPTextEncode",
            "inputs": {"clip": ["4", 1], "text": "blurry, low quality"},
        },
        "3": {
            "class_type": "KSampler",
            "inputs": {
                "model": ["4", 0],
                "positive": ["6", 0],
                "negative": ["7", 0],
                "latent_image": ["5", 0],
                "seed": 42,
                "steps": 10,
                "cfg": 7,
                "sampler_name": "euler",
                "scheduler": "normal",
                "denoise": 1,
            },
        },
        "8": {
            "class_type": "VAEDecode",
            "inputs": {"samples": ["3", 0], "vae": ["4", 2]},
        },
        "9": {
            "class_type": "SaveImage",
            "inputs": {"filename_prefix": "test", "images": ["8", 0]},
        },
    },
    "output_node_id": "9",
}

endpoint = novita_gpus.Endpoint("0f43a6867e05fddd")
job = endpoint.run(input_payload)

print(job.status())
output = job.output(timeout=300)
print(output)
The novita-gpus SDK default request URL is https://async-public.serverless.novita.ai/v1.

6. Manage Async Serverless Endpoint

See Manage Serverless Endpoint.