1. Prepare Container Image
You need to package your runtime environment into a Docker image and upload it to an image repository in advance. Both public and private image repositories are supported (credentials required for private repositories).- You can upload your image to Docker Hub. The platform currently provides an image warm-up service for this site.
runpod/worker-comfyui:5.5.0-flux1-dev model image. When using the worker-comfyui image, please configure S3-related settings such as BUCKET_ENDPOINT_URL. These settings ensure that images and videos generated by the async serverless endpoint are uploaded to your S3 bucket.
2. Select Instance Specification
Currently, Async Serverless Endpoint supports the following GPU instance types:- RTX 4090 24GB
- H100 SXM 80GB
3. Create Cloud Storage (Optional)
If you need shared or persistent storage, you can create cloud storage on the storage management page, and mount this storage when creating an instance. For more details, see Manage Cloud Storage.4. Create Endpoint
- Go to the Async Serverless GPUs page, choose an instance type, and click “Create Endpoint”.
- Complete the Endpoint parameter configuration:
- Endpoint Name: Used to uniquely identify the endpoint; it will be part of the URL when creating jobs. The system will generate a random default name; you may customize it but using the default name is recommended.
- Worker Configuration
| Configuration Item | Description |
|---|---|
| Min Worker Count | The minimum number of worker instances to keep for the endpoint. Setting a higher minimum helps reduce cold start time. If set to 0, there will be no idle workers when there are no requests, which may increase response time for incoming requests. For latency-sensitive scenarios, use 0 with caution. |
| Max Worker Count | The maximum number of worker instances that the endpoint can scale up to. When request volume increases, the platform automatically increases workers up to this maximum. This limit helps control costs. |
| Idle Timeout (seconds) | When a worker is about to be released due to autoscaling down, the platform will keep the worker alive for the specified idle timeout period to be able to react quickly to new requests. Note that you will be charged for the worker during this period. |
| Max Concurrent Requests | The maximum number of concurrent requests handled by a single worker. If this is exceeded, requests will be routed to other workers. If all workers are fully occupied, excess requests will be queued until execution is possible. |
| GPUs / Worker | Number of GPU cards allocated to each worker. |
| CUDA Version | Specify the CUDA version supported for the worker. |
-
Type:
- Select the Endpoint type: choose Async (asynchronous).
-
Elastic Policy:
- Queue request policy: The number of Workers is automatically scaled according to the number of queued requests. By default, each Worker can only process one job at a time. You need to specify the maximum number of requests supported by each Worker.
-
Image Configuration:
- Image address: The address of the image to deploy, e.g.,
runpod/worker-comfyui:5.5.0-flux1-dev. - Image repository credentials: If using a private image, provide access credentials so the image can be pulled. You can create credentials at the security credentials management page.
- HTTP Port: The HTTP port to expose on the Worker.
- Container start command: The command to run when the container starts.
- Image address: The address of the image to deploy, e.g.,
-
Storage Configuration:
- System disk: System disk size per Worker instance.
- Cloud storage: Select your cloud storage if you wish to mount it. For details, see Manage Cloud Storage.
-
Other:
- Health check path: This parameter is currently not enabled.
- Environment variables: Set necessary environment variables for the service. These will be initialized automatically when the Worker starts. For example:
BUCKET_ENDPOINT_URL=https://<your-bucket-name>s3.<aws-region>.amazonaws.comBUCKET_ACCESS_KEY_ID=AKIASVYYYN6L4S6TTTTTTBUCKET_SECRET_ACCESS_KEY=maVz2OwY98UUUUUUGjMsmR/Yo8/Zzw0qWMMMMMMM
- Review pricing, then click “Deploy with One Click”.
5. Access the Service
- In Async Serverless GPUs, find the newly created Endpoint and ensure its status is “Running”.
- Ensure that at least one Worker in the Endpoint is running.
- Ensure you have the corresponding api key for authentication. The Endpoint creator and the api key owner must belong to the same team.
5.1 Create a Job and Retrieve Output via Curl
Below is an example showing an actual use of the worker-comfyui worker . Replace0f43a6867e05fddd in the URL with your real endpointName, and replace sk_xxxx in the example with your actual user api key.
The maximum job size accepted by Async Serverless Endpoint is 4 MiB.
id is the job_id):
The maximum output you can get via the status API of the Async Service Endpoint is 4 MiB. To avoid this limitation, configure S3 environment variables and upload output images or videos to S3 in your
handler.py, so output size is not limited.Job results are kept in the Async Serverless Endpoint for up to 6 hours after completion.