MiniMax Speech 2.8 HD Async Text-to-Speech
Audio
MiniMax Speech 2.8 HD Async Text-to-Speech
POST
MiniMax Speech 2.8 HD Async Text-to-Speech
MiniMax asynchronous text-to-speech API, supports various voice, emotion, speed and other parameter settings, text length limit up to 50,000 characters, supports file input (up to 100,000 characters)Documentation Index
Fetch the complete documentation index at: https://novita.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Request Headers
Supports:
application/jsonBearer authentication format, for example: Bearer {{API Key}}.
Request Body
Text to synthesize into audio, maximum length is 50,000 characters. Either
text or text_file_id is required.- Interjection tags: Only supported when model is
speech-2.8-hdorspeech-2.8-turbo. Supported interjections:(laughs)(laughter),(chuckle)(light laugh),(coughs)(cough),(clear-throat)(clear throat),(groans)(groan),(breath)(normal breathing),(pant)(panting),(inhale)(inhale),(exhale)(exhale),(gasps)(gasp),(sniffs)(sniff),(sighs)(sigh),(snorts)(snort),(burps)(burp),(lip-smacking)(lip smacking),(humming)(humming),(hissing)(hissing),(emm)(um),(whistles)(whistle),(sneezes)(sneeze),(crying)(crying),(applause)(applause)
Text file ID for audio synthesis, single file length limit is less than 100,000 characters, supported file formats: txt, zip. Either
text or text_file_id is required, format will be automatically validated.- txt file: Length limit <100000 characters. Supports custom pause using
<#x#>tag. x is pause duration (in seconds), range [0.01, 99.99], up to 2 decimal places. Pause must be set between two pronounceable text segments, cannot use multiple pause tags consecutively - zip file:
- Compressed package must contain txt or json files of the same format.
- json file format: Supports [
title,content,extra] three fields, representing title, body, and additional information. If all three fields exist, 3 groups of results will be produced, 9 files in total, stored in one folder. If a field does not exist or is empty, no corresponding result will be generated
Controls whether to add audio rhythm identifier at the end of synthesized audio, default is False. This parameter is only valid for non-streaming synthesis
Whether to enhance recognition ability for specified minor languages and dialects. Default is
null, can be set to auto to let the model decide automatically.Optional values: Chinese, Chinese,Yue, English, Arabic, Russian, Spanish, French, Portuguese, German, Turkish, Dutch, Ukrainian, Vietnamese, Indonesian, Japanese, Italian, Korean, Thai, Polish, Romanian, Greek, Czech, Finnish, Hindi, Bulgarian, Danish, Hebrew, Malay, Persian, Slovak, Swedish, Croatian, Filipino, Hungarian, Norwegian, Slovenian, Catalan, Nynorsk, Tamil, Afrikaans, autoEnable this parameter to make clause transitions more natural, only supported by
speech-2.8-hd and speech-2.8-turbo modelsResponse
Corresponding audio file ID returned after task creation.
- After task completion, use file_id to download
- This field is not returned when request fails
Use the task_id to retrieve the generated outputs.
Token used to complete the current task
Billable character count