Skip to main content

LLM API


LLM API

Introduction

We provides compatibility for the OpenAI API standard, allowing easier integrations into existing applications. The api_base is https://api.novita.ai/v3/openai.

The APIs we support are:

The Models we support are:

You can find all the models we support here: https://novita.ai/llm-apiopen in new window.

Example with python client

pip install 'openai>=1.0.0'
  • Chat Completions API
from openai import OpenAI

client = OpenAI(
    base_url="https://api.novita.ai/v3/openai",
    # Get the Novita AI API Key by referring: https://novita.ai/get-started/Quick_Start.html#_3-create-an-api-key
    api_key="<YOUR Novita AI API Key>",
)

model = "Nous-Hermes-2-Mixtral-8x7B-DPO"
stream = True # or False
max_tokens = 512

chat_completion_res = client.chat.completions.create(
    model=model,
    messages=[
        {
            "role": "system",
            "content": "Act like you are a helpful assistant.",
        },
        {
            "role": "user",
            "content": "Hi there!",
        }
    ],
    stream=stream,
    max_tokens=max_tokens,
 )

if stream:
    for chunk in chat_completion_res:
        print(chunk.choices[0].delta.content or "", end="")
else:
    print(chat_completion_res.choices[0].message.content)
  • Completions API
from openai import OpenAI

client = OpenAI(
    base_url="https://api.novita.ai/v3/openai",
    # Get the Novita AI API Key by referring: https://novita.ai/get-started/Quick_Start.html#_3-create-an-api-key
    api_key="<YOUR Novita AI API Key>",
)

model = "Nous-Hermes-2-Mixtral-8x7B-DPO"
stream = True # or False
max_tokens = 512

completion_res = client.completions.create(
    model=model,
    prompt="A chat between a curious user and an artificial intelligence assistant.\nYou are a cooking assistant.\nBe edgy in your cooking ideas.\nUSER: How do I make pasta?\nASSISTANT: First, boil water. Then, add pasta to the boiling water. Cook for 8-10 minutes or until al dente. Drain and serve!\nUSER: How do I make it better?\nASSISTANT:",
    stream=stream,
    max_tokens=max_tokens,
 )

if stream:
    for chunk in completion_res:
        print(chunk.choices[0].text or "", end="")
else:
    print(completion_res.choices[0].text)

Example with curl client

  • Chat Completions API
# Get the Novita AI API Key by referring: https://novita.ai/get-started/Quick_Start.html#_3-create-an-api-key
export API_KEY="{YOUR Novita AI API Key}"

curl "https://api.novita.ai/v3/openai/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${API_KEY}" \
  -d '{
    "model": "Nous-Hermes-2-Mixtral-8x7B-DPO",
    "messages": [
        {
            "role": "system",
            "content": "Act like you are a helpful assistant."
        },
        {
            "role": "user",
            "content": "Hi there!"
        }
    ],
    "max_tokens": 512
}'
  • Completions API
# Get the Novita AI API Key by referring: https://novita.ai/get-started/Quick_Start.html#_3-create-an-api-key
export API_KEY="{YOUR Novita AI API Key}"

curl "https://api.novita.ai/v3/openai/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${API_KEY}" \
  -d '{
    "model": "Nous-Hermes-2-Mixtral-8x7B-DPO",
    "prompt": "A chat between a curious user and an artificial intelligence assistant.\nYou are a cooking assistant.\nBe edgy in your cooking ideas.\nUSER: How do I make pasta?\nASSISTANT: First, boil water. Then, add pasta to the boiling water. Cook for 8-10 minutes or until al dente. Drain and serve!\nUSER: How do I make it better?\nASSISTANT:",
    "max_tokens": 512
}'

If you're already using OpenAI's chat completion endpoint, you can simply set the base_url as https://api.novita.ai/v3/openai, obtain and set your API Key (detailed instructions available at https://novita.ai/get-started/Quick_Start.html#_3-create-an-api-key), and update the model name according to your needs. With these steps, you're good to go.

Model parameter

Please note that we're not yet 100% compatible. If you have any other issues, you can start an issue in our Discord server channel #issuesopen in new window. Supported request parameters include:

ChatCompletions and Completions:

  • model, find all the models we support here: https://novita.ai/llm-apiopen in new window.
  • messages, including roles system, user, assistant. (only availabe for ChatCompletion Endpoint)
    • content, The contents of the message.
    • role, The role of the messages author, in this case system.
    • name, An optional name for the participant. Provides the model information to differentiate between participants of the same role.
  • prompt: The prompt(s) to generate completions for, encoded as a string, array of strings, array of tokens, or array of token arrays. (only availabe for Completion Endpoint)
  • max_tokens , the maximum number of tokens that can be generated in the chat completion.
  • stream , if set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message.
  • temperature, What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
  • top_p, an alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
  • stop, Up to 4 sequences where the API will stop generating further tokens.
  • n, How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
  • presence_penalty, Float that penalizes new tokens based on whether they appear in the generated text so far. Values > 0 encourage the model to use new tokens, while values < 0 encourage the model to repeat tokens.
  • frequency_penalty, Float that penalizes new tokens based on their frequency in the generated text so far. Values > 0 encourage the model to use new tokens, while values < 0 encourage the model to repeat tokens.
  • repetition_penalty: Float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values > 1 encourage the model to use new tokens, while values < 1 encourage the model to repeat tokens.
  • logit_bias, an optional parameter that modifies the likelihood of specified tokens appearing in a model generated output.