LLM API
LLM API
Introduction
We provide compatibility with the OpenAI API standard, allowing for easier integration into existing applications.
The API Base URL
https://api.novita.ai/v3/openai
The APIs we support are:
- Chat Completion, both streaming and regular.
- Completion, both streaming and regular.
The Models we support are:
You can find all the models we support here: https://novita.ai/llm-api or request the to get all available models.
Example with Python Client
pip install 'openai>=1.0.0'
- Chat Completions API
from openai import OpenAI
client = OpenAI(
base_url="https://api.novita.ai/v3/openai",
# Get the Novita AI API Key by referring to: https://novita.ai/docs/get-started/quickstart.html#_2-manage-api-key.
api_key="<YOUR Novita AI API Key>",
)
model = "meta-llama/llama-3.1-8b-instruct"
stream = True # or False
max_tokens = 512
chat_completion_res = client.chat.completions.create(
model=model,
messages=[
{
"role": "system",
"content": "Act like you are a helpful assistant.",
},
{
"role": "user",
"content": "Hi there!",
}
],
stream=stream,
max_tokens=max_tokens,
)
if stream:
for chunk in chat_completion_res:
print(chunk.choices[0].delta.content or "", end="")
else:
print(chat_completion_res.choices[0].message.content)
- Completions API
from openai import OpenAI
client = OpenAI(
base_url="https://api.novita.ai/v3/openai",
# Get the Novita AI API Key by referring to: https://docs/get-started/quickstart.html#_2-manage-api-key
api_key="<YOUR Novita AI API Key>",
)
model = "meta-llama/llama-3.1-8b-instruct"
stream = True # or False
max_tokens = 512
completion_res = client.completions.create(
model=model,
prompt="A chat between a curious user and an artificial intelligence assistant.\nYou are a cooking assistant.\nBe edgy in your cooking ideas.\nUSER: How do I make pasta?\nASSISTANT: First, boil water. Then, add pasta to the boiling water. Cook for 8-10 minutes or until al dente. Drain and serve!\nUSER: How do I make it better?\nASSISTANT:",
stream=stream,
max_tokens=max_tokens,
)
if stream:
for chunk in completion_res:
print(chunk.choices[0].text or "", end="")
else:
print(completion_res.choices[0].text)
Example with Curl Client
- Chat Completions API
# Get the Novita AI API Key by referring to: https://novita.ai/docs/get-started/quickstart.html#_2-manage-api-key
export API_KEY="{YOUR Novita AI API Key}"
curl "https://api.novita.ai/v3/openai/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${API_KEY}" \
-d '{
"model": "meta-llama/llama-3.1-8b-instruct",
"messages": [
{
"role": "system",
"content": "Act like you are a helpful assistant."
},
{
"role": "user",
"content": "Hi there!"
}
],
"max_tokens": 512
}'
- Completions API
# Get the Novita AI API Key by referring to: https://novita.ai/docs/get-started/quickstart.html#_2-manage-api-key
export API_KEY="{YOUR Novita AI API Key}"
curl "https://api.novita.ai/v3/openai/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${API_KEY}" \
-d '{
"model": "meta-llama/llama-3.1-8b-instruct",
"prompt": "A chat between a curious user and an artificial intelligence assistant.\nYou are a cooking assistant.\nBe edgy in your cooking ideas.\nUSER: How do I make pasta?\nASSISTANT: First, boil water. Then, add pasta to the boiling water. Cook for 8-10 minutes or until al dente. Drain and serve!\nUSER: How do I make it better?\nASSISTANT:",
"max_tokens": 512
}'
If you're already using OpenAI's chat completion endpoint, you can simply set the base URL to https://api.novita.ai/v3/openai
, obtain and set your API Key (detailed instructions are available at https://novita.ai/docs/get-started/quickstart.html#_2-manage-api-key), and update the model name according to your needs. With these steps, you're good to go.
Model Parameters
Please note that we are not yet 100% compatible. If you encounter any issues, you can start a discussion in our Discord server channel #issues.
ChatCompletions and Completions:
model
, find all the models we support here: https://novita.ai/llm-api.messages
, including rolessystem
,user
,assistant
. (only availabe forChatCompletion
Endpoint)content
, the contents of the message.role
, the role of the messages author, in this case system.name
, an optional name for the participant. Provides the model information to differentiate between participants of the same role.
prompt
, the prompt(s) to generate completions for, encoded as a string, array of strings, array of tokens, or array of token arrays. (only availabe forCompletion
Endpoint)max_tokens
, the maximum number of tokens that can be generated in the chat completion.stream
, if set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message.temperature
, what sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.top_p
, an alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this ortemperature
but not both.top_k
, integer that controls the number of top tokens to consider. Set to -1 to consider all tokens.min_p
, float that represents the minimum probability for a token to be considered, relative to the probability of the most likely token. Must be in [0, 1]. Set to 0 to disable this.stop
, up to 4 sequences where the API will stop generating further tokens.n
, how many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.presence_penalty
, float that penalizes new tokens based on whether they appear in the generated text so far. Values > 0 encourage the model to use new tokens, while values < 0 encourage the model to repeat tokens.frequency_penalty
, float that penalizes new tokens based on their frequency in the generated text so far. Values > 0 encourage the model to use new tokens, while values < 0 encourage the model to repeat tokens.repetition_penalty
, float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values > 1 encourage the model to use new tokens, while values < 1 encourage the model to repeat tokens.logit_bias
, an optional parameter that modifies the likelihood of specified tokens appearing in a model generated output.
Error codes
If the response status code is not 200, we will return the error code and message in JSON format in the response body. The format is as follows:
{
"code": integer,
"reason": "string",
"message": "string"
}
Error codes overview
Code | Reason | Description |
---|---|---|
401 | INVALID_API_KEY | The API key is invalid. You can check your API key here: |
403 | NOT_ENOUGH_BALANCE | Your credit is not enough. You can top up more credit here: |
404 | MODEL_NOT_FOUND | The requested model is not found. You can find all the models we support here: https://novita.ai/llm-api or request the to get all available models. |
429 | RATE_LIMIT_EXCEEDED | You have exceeded the rate limit. Please refer to for more information. |
500 | MODEL_NOT_AVAILABLE | The requested model is not available now. This is usually due to the model being under maintenance. You can contact us on for more information. |