Skip to main content

LLM API


LLM API

Introduction

We provides compatibility for the OpenAI API standard, allowing easier integrations into existing applications. The api_base is https://api.novita.ai/v3/openai.

The APIs we support are:

The Models we support are:

You can find all the models we support here: https://novita.ai/llm-apiopen in new window.

Example with python client

pip install 'openai>=1.0.0'
  • Chat Completions API
from openai import OpenAI

client = OpenAI(
    base_url="https://api.novita.ai/v3/openai",
    # Get the NovitaAI API Key by referring: https://novita.ai/get-started/Quick_Start.html#_3-create-an-api-key
    api_key="<YOUR NovitaAI API Key>",
)

model = "Nous-Hermes-2-Mixtral-8x7B-DPO"
stream = True # or False
max_tokens = 512

chat_completion_res = client.chat.completions.create(
    model=model,
    messages=[
        {
            "role": "system",
            "content": "Act like you are a helpful assistant.",
        },
        {
            "role": "user",
            "content": "Hi there!",
        }
    ],
    stream=stream,
    max_tokens=max_tokens,
 )

if stream:
    for chunk in chat_completion_res:
        print(chunk.choices[0].delta.content or "", end="")
else:
    print(chat_completion_res.choices[0].message.content)
  • Completions API
from openai import OpenAI

client = OpenAI(
    base_url="https://api.novita.ai/v3/openai",
    # Get the NovitaAI API Key by referring: https://novita.ai/get-started/Quick_Start.html#_3-create-an-api-key
    api_key="<YOUR NovitaAI API Key>",
)

model = "Nous-Hermes-2-Mixtral-8x7B-DPO"
stream = True # or False
max_tokens = 512

completion_res = client.completions.create(
    model=model,
    prompt="A chat between a curious user and an artificial intelligence assistant.\nYou are a cooking assistant.\nBe edgy in your cooking ideas.\nUSER: How do I make pasta?\nASSISTANT: First, boil water. Then, add pasta to the boiling water. Cook for 8-10 minutes or until al dente. Drain and serve!\nUSER: How do I make it better?\nASSISTANT:",
    stream=stream,
    max_tokens=max_tokens,
 )

if stream:
    for chunk in completion_res:
        print(chunk.choices[0].text or "", end="")
else:
    print(completion_res.choices[0].text)

Example with curl client

  • Chat Completions API
# Get the NovitaAI API Key by referring: https://novita.ai/get-started/Quick_Start.html#_3-create-an-api-key
export API_KEY="{YOUR NovitaAI API Key}"

curl "https://api.novita.ai/v3/openai/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${API_KEY}" \
  -d '{
    "model": "Nous-Hermes-2-Mixtral-8x7B-DPO",
    "messages": [
        {
            "role": "system",
            "content": "Act like you are a helpful assistant."
        },
        {
            "role": "user",
            "content": "Hi there!"
        }
    ],
    "max_tokens": 512
}'
  • Completions API
# Get the NovitaAI API Key by referring: https://novita.ai/get-started/Quick_Start.html#_3-create-an-api-key
export API_KEY="{YOUR NovitaAI API Key}"

curl "https://api.novita.ai/v3/openai/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${API_KEY}" \
  -d '{
    "model": "Nous-Hermes-2-Mixtral-8x7B-DPO",
    "prompt": "A chat between a curious user and an artificial intelligence assistant.\nYou are a cooking assistant.\nBe edgy in your cooking ideas.\nUSER: How do I make pasta?\nASSISTANT: First, boil water. Then, add pasta to the boiling water. Cook for 8-10 minutes or until al dente. Drain and serve!\nUSER: How do I make it better?\nASSISTANT:",
    "max_tokens": 512
}'

If you're already using OpenAI's chat completion endpoint, you can simply set the base_url as https://api.novita.ai/v3/openai, obtain and set your API Key (detailed instructions available at https://novita.ai/get-started/Quick_Start.html#_3-create-an-api-key), and update the model name according to your needs. With these steps, you're good to go.

Model parameter

Please note that we're not yet 100% compatible. If you have any other issues, you can start an issue in our Discord server channel #issuesopen in new window. Supported request parameters include:

ChatCompletions and Completions:

  • model, find all the models we support here: https://novita.ai/llm-apiopen in new window.
  • messages, including roles system, user, assistant. (only availabe for ChatCompletion Endpoint)
  • prompt: The prompt(s) to generate completions for, encoded as a string, array of strings, array of tokens, or array of token arrays. (only availabe for Completion Endpoint)
  • max_tokens , the maximum number of tokens that can be generated in the chat completion.
  • stream , if set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message.
  • temperature, What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
  • top_p, an alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
  • stop, Up to 4 sequences where the API will stop generating further tokens.
  • n, How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
  • presence_penalty, Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
  • frequency_penalty, Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
  • logit_bias, an optional parameter that modifies the likelihood of specified tokens appearing in a model generated output.