Create Endpoint

curl --request POST \
  --url https://api.novita.ai/gpu-instance/openapi/v1/endpoint/create \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: <content-type>' \
  --data '{
  "endpoint": {
    "name": "<string>",
    "appName": "<string>",
    "workerConfig": [
      {
        "minNum": 123,
        "maxNum": 123,
        "freeTimeout": 123,
        "maxConcurrent": 123,
        "gpuNum": 123
      }
    ],
    "ports": [
      {
        "port": "<string>"
      }
    ],
    "policy": [
      {
        "type": "<string>",
        "value": 123
      }
    ],
    "image": [
      {
        "image": "<string>",
        "authId": "<string>",
        "command": "<string>"
      }
    ],
    "products": [
      {
        "id": "<string>"
      }
    ],
    "rootfsSize": 123,
    "volumeMounts": [
      {
        "type": "<string>",
        "size": 123,
        "id": "<string>",
        "mountPath": "<string>"
      }
    ],
    "clusterID": "<string>",
    "envs": [
      {
        "key": "<string>",
        "value": "<string>"
      }
    ],
    "healthy": {
      "path": "<string>"
    }
  }
}'

{
  "id": "<string>"
}

POST

gpu-instance

openapi

endpoint

create

Create Endpoint

curl --request POST \
  --url https://api.novita.ai/gpu-instance/openapi/v1/endpoint/create \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: <content-type>' \
  --data '{
  "endpoint": {
    "name": "<string>",
    "appName": "<string>",
    "workerConfig": [
      {
        "minNum": 123,
        "maxNum": 123,
        "freeTimeout": 123,
        "maxConcurrent": 123,
        "gpuNum": 123
      }
    ],
    "ports": [
      {
        "port": "<string>"
      }
    ],
    "policy": [
      {
        "type": "<string>",
        "value": 123
      }
    ],
    "image": [
      {
        "image": "<string>",
        "authId": "<string>",
        "command": "<string>"
      }
    ],
    "products": [
      {
        "id": "<string>"
      }
    ],
    "rootfsSize": 123,
    "volumeMounts": [
      {
        "type": "<string>",
        "size": 123,
        "id": "<string>",
        "mountPath": "<string>"
      }
    ],
    "clusterID": "<string>",
    "envs": [
      {
        "key": "<string>",
        "value": "<string>"
      }
    ],
    "healthy": {
      "path": "<string>"
    }
  }
}'

{
  "id": "<string>"
}

Request Headers

Content-Type

string

required

Enum: application/json

Authorization

string

required

Bearer authentication format, for example: Bearer {{API Key}}.

Request Body

endpoint

object

required

Endpoint configuration details.

Hide properties

name

string

Endpoint name. String, length limit: 0-220 characters.

appName

string

Application name (appears in the URL). This is a customizable part of the Endpoint URL, defaults to the Endpoint ID if not specified.

workerConfig

object[]

required

Worker configuration. The valid range can be dynamically obtained via the parameter range API.

Show properties

minNum

integer

required

Minimum number of workers.

maxNum

integer

required

Maximum number of workers.

freeTimeout

integer

required

Idle timeout (seconds). Unit: seconds.

maxConcurrent

integer

required

Maximum concurrency.

gpuNum

integer

required

Number of GPUs per worker.

ports

object[]

required

HTTP ports. Only one is supported. Supported port range: 1-65535, except for 2222, 2223, and 2224 which are reserved for internal use.

Show properties

port

string

required

HTTP port.

policy

object[]

required

Auto-scaling policy. The valid range can be dynamically obtained via the parameter range API.

Show properties

type

string

required

Policy type. Options:

queue: Queue latency policy, adjusts the number of workers based on the waiting time of requests in the queue.
concurrency: Queue request policy, automatically adjusts the number of workers based on the number of requests in the queue.

value

integer

required

The meaning of value depends on the type:

If type = queue, value is the queue waiting time in seconds.
If type = concurrency, value is the maximum number of requests in the queue.

image

object[]

required

Image information.

Show properties

image

string

required

Image URL. String, length limit: 0-511 characters.

authId

string

Private image credential ID (not required for public images or platform user images). String, length limit: 0-255 characters.

command

string

Container startup command. String, length limit: 0-2047 characters.

products

object[]

required

Product information.

Show properties

string

required

Product ID.

rootfsSize

integer

required

System disk size (GB). Currently, set to a fixed value of 100.

volumeMounts

object[]

required

Storage information (GB).

Show properties

type

string

required

Storage type. Options:

local: Local storage.
network: Network storage.

size

integer

Local storage size, currently set to a fixed value of 30. Not required for network storage.

string

Network storage ID. Not required for local storage.

mountPath

string

required

Mount path. String, length limit: 0-255 characters.

clusterID

string

Cluster information. Required when mounting cloud storage, and must match the cluster ID where the cloud storage resides. String, length limit: 0-255 characters.

envs

object[]

Environment variables.

Show properties

key

string

required

Environment variable name.

value

string

required

Environment variable value.

healthy

object

required

Health check endpoint.

Show properties

path

string

required

Path to be checked via HTTP request for health monitoring.

Response

string

The created Endpoint ID.

Get Endpoint Parameter Limit Ranges List Endpoints

Basic

Model APIs

GPUs

Create Endpoint

Request Headers

Request Body

Response

Basic

Model APIs

GPUs

​Request Headers

​Request Body

​Response

Request Headers

Request Body

Response