Skip to main content
POST
https://api.novita.ai
/
v3
/
async
/
minimax-speech-2.8-turbo
MiniMax Speech 2.8 Turbo Async Text-to-Speech
curl --request POST \
  --url https://api.novita.ai/v3/async/minimax-speech-2.8-turbo \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: <content-type>' \
  --data '
{
  "text": "<string>",
  "text_file_id": 123,
  "voice_modify": {
    "pitch": 123,
    "timbre": 123,
    "intensity": 123,
    "sound_effects": "<string>"
  },
  "audio_setting": {
    "format": "<string>",
    "bitrate": 123,
    "channel": 123,
    "audio_sample_rate": 123
  },
  "voice_setting": {
    "vol": 123,
    "pitch": 123,
    "speed": 123,
    "emotion": "<string>",
    "voice_id": "<string>",
    "english_normalization": true
  },
  "aigc_watermark": true,
  "language_boost": "<string>",
  "continuous_sound": true,
  "pronunciation_dict": {
    "tone": [
      {}
    ]
  }
}
'
{
  "file_id": 123,
  "task_id": "<string>",
  "base_resp": {
    "status_msg": "<string>",
    "status_code": 123
  },
  "task_token": "<string>",
  "usage_characters": 123
}
MiniMax asynchronous text-to-speech API, supports various voice, emotion, speed and other parameter settings, text length limit up to 50,000 characters, supports file input (up to 100,000 characters)
This is an asynchronous API; only the task_id will be returned. You should use the task_id to request the Task Result API to retrieve the video generation results.

Request Headers

Content-Type
string
required
Supports: application/json
Authorization
string
required
Bearer authentication format, for example: Bearer {{API Key}}.

Request Body

text
string
Text to synthesize into audio, maximum length is 50,000 characters. Either text or text_file_id is required.

• Interjection tags: Only supported when model is speech-2.8-hd or speech-2.8-turbo. Supported interjections: (laughs) (laughter), (chuckle) (light laugh), (coughs) (cough), (clear-throat) (clear throat), (groans) (groan), (breath) (normal breathing), (pant) (panting), (inhale) (inhale), (exhale) (exhale), (gasps) (gasp), (sniffs) (sniff), (sighs) (sigh), (snorts) (snort), (burps) (burp), (lip-smacking) (lip smacking), (humming) (humming), (hissing) (hissing), (emm) (um), (whistles) (whistle), (sneezes) (sneeze), (crying) (crying), (applause) (applause)
text_file_id
integer
Text file ID for audio synthesis, single file length limit is less than 100,000 characters, supported file formats: txt, zip. Either text or text_file_id is required, format will be automatically validated.
• txt file: Length limit <100000 characters. Supports custom pause using &lt;#x#&gt; tag. x is pause duration (in seconds), range [0.01, 99.99], up to 2 decimal places. Pause must be set between two pronounceable text segments, cannot use multiple pause tags consecutively
• zip file:
• Compressed package must contain txt or json files of the same format.
• json file format: Supports [title, content, extra] three fields, representing title, body, and additional information. If all three fields exist, 3 groups of results will be produced, 9 files in total, stored in one folder. If a field does not exist or is empty, no corresponding result will be generated
voice_modify
object
audio_setting
object
voice_setting
object
required
aigc_watermark
boolean
default:false
Controls whether to add audio rhythm identifier at the end of synthesized audio, default is False. This parameter is only valid for non-streaming synthesis
language_boost
string
Whether to enhance recognition ability for specified minor languages and dialects. Default is null, can be set to auto to let the model decide automatically.Optional values: Chinese, Chinese,Yue, English, Arabic, Russian, Spanish, French, Portuguese, German, Turkish, Dutch, Ukrainian, Vietnamese, Indonesian, Japanese, Italian, Korean, Thai, Polish, Romanian, Greek, Czech, Finnish, Hindi, Bulgarian, Danish, Hebrew, Malay, Persian, Slovak, Swedish, Croatian, Filipino, Hungarian, Norwegian, Slovenian, Catalan, Nynorsk, Tamil, Afrikaans, auto
continuous_sound
boolean
default:false
Enable this parameter to make clause transitions more natural, only supported by speech-2.8-hd and speech-2.8-turbo models
pronunciation_dict
object

Response

file_id
integer
Corresponding audio file ID returned after task creation.

. This field is not returned when request fails
Note: The download URL is valid for 9 hours (32400 seconds) from generation. After expiration, the file will become invalid and generated information will be lost. Please pay attention to download timing
task_id
string
Use the task_id to retrieve the generated outputs.
base_resp
object
task_token
string
Token used to complete the current task
usage_characters
integer
Billable character count