LLM API
LLM API
Introduction
We provide compatibility with the OpenAI API standard, allowing for easier integration into existing applications.
The API Base URL is
https://api.novita.ai/v3/openai
.
The APIs we support are:
- Chat Completion, both streaming and regular.
- Completion, both streaming and regular.
The Models we support are:
You can find all the models we support here: https://novita.ai/llm-api.
Example with Python Client
pip install 'openai>=1.0.0'
- Chat Completions API
from openai import OpenAI
client = OpenAI(
base_url="https://api.novita.ai/v3/openai",
# Get the Novita AI API Key by referring to: https://novita.ai/docs/get-started/quickstart.html#_2-manage-api-key.
api_key="<YOUR Novita AI API Key>",
)
model = "Nous-Hermes-2-Mixtral-8x7B-DPO"
stream = True # or False
max_tokens = 512
chat_completion_res = client.chat.completions.create(
model=model,
messages=[
{
"role": "system",
"content": "Act like you are a helpful assistant.",
},
{
"role": "user",
"content": "Hi there!",
}
],
stream=stream,
max_tokens=max_tokens,
)
if stream:
for chunk in chat_completion_res:
print(chunk.choices[0].delta.content or "", end="")
else:
print(chat_completion_res.choices[0].message.content)
- Completions API
from openai import OpenAI
client = OpenAI(
base_url="https://api.novita.ai/v3/openai",
# Get the Novita AI API Key by referring to: https://docs/get-started/quickstart.html#_2-manage-api-key
api_key="<YOUR Novita AI API Key>",
)
model = "Nous-Hermes-2-Mixtral-8x7B-DPO"
stream = True # or False
max_tokens = 512
completion_res = client.completions.create(
model=model,
prompt="A chat between a curious user and an artificial intelligence assistant.\nYou are a cooking assistant.\nBe edgy in your cooking ideas.\nUSER: How do I make pasta?\nASSISTANT: First, boil water. Then, add pasta to the boiling water. Cook for 8-10 minutes or until al dente. Drain and serve!\nUSER: How do I make it better?\nASSISTANT:",
stream=stream,
max_tokens=max_tokens,
)
if stream:
for chunk in completion_res:
print(chunk.choices[0].text or "", end="")
else:
print(completion_res.choices[0].text)
Example with Curl Client
- Chat Completions API
# Get the Novita AI API Key by referring to: https://novita.ai/docs/get-started/quickstart.html#_2-manage-api-key
export API_KEY="{YOUR Novita AI API Key}"
curl "https://api.novita.ai/v3/openai/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${API_KEY}" \
-d '{
"model": "Nous-Hermes-2-Mixtral-8x7B-DPO",
"messages": [
{
"role": "system",
"content": "Act like you are a helpful assistant."
},
{
"role": "user",
"content": "Hi there!"
}
],
"max_tokens": 512
}'
- Completions API
# Get the Novita AI API Key by referring to: https://novita.ai/docs/get-started/quickstart.html#_2-manage-api-key
export API_KEY="{YOUR Novita AI API Key}"
curl "https://api.novita.ai/v3/openai/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${API_KEY}" \
-d '{
"model": "Nous-Hermes-2-Mixtral-8x7B-DPO",
"prompt": "A chat between a curious user and an artificial intelligence assistant.\nYou are a cooking assistant.\nBe edgy in your cooking ideas.\nUSER: How do I make pasta?\nASSISTANT: First, boil water. Then, add pasta to the boiling water. Cook for 8-10 minutes or until al dente. Drain and serve!\nUSER: How do I make it better?\nASSISTANT:",
"max_tokens": 512
}'
If you're already using OpenAI's chat completion endpoint, you can simply set the base URL to https://api.novita.ai/v3/openai
, obtain and set your API Key (detailed instructions are available at https://novita.ai/docs/get-started/quickstart.html#_2-manage-api-key), and update the model name according to your needs. With these steps, you're good to go.
Model Parameters
Please note that we are not yet 100% compatible. If you encounter any issues, you can start a discussion in our Discord server channel #issues.
ChatCompletions and Completions:
model
, find all the models we support here: https://novita.ai/llm-api.messages
, including rolessystem
,user
,assistant
. (only availabe forChatCompletion
Endpoint)content
, The contents of the message.role
, The role of the messages author, in this case system.name
, An optional name for the participant. Provides the model information to differentiate between participants of the same role.
prompt
: The prompt(s) to generate completions for, encoded as a string, array of strings, array of tokens, or array of token arrays. (only availabe forCompletion
Endpoint)max_tokens
, the maximum number of tokens that can be generated in the chat completion.stream
, if set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message.temperature
, What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.top_p
, an alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this ortemperature
but not both.stop
, Up to 4 sequences where the API will stop generating further tokens.n
, How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.presence_penalty
, Float that penalizes new tokens based on whether they appear in the generated text so far. Values > 0 encourage the model to use new tokens, while values < 0 encourage the model to repeat tokens.frequency_penalty
, Float that penalizes new tokens based on their frequency in the generated text so far. Values > 0 encourage the model to use new tokens, while values < 0 encourage the model to repeat tokens.repetition_penalty
: Float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values > 1 encourage the model to use new tokens, while values < 1 encourage the model to repeat tokens.logit_bias
, an optional parameter that modifies the likelihood of specified tokens appearing in a model generated output.