This API supports asynchronous text-to-speech (TTS) generation, with a maximum limit of 1 million characters per request for text input. The resulting audio can be retrieved asynchronously. Over 100 system and cloned voices are available, with customizable parameters including pitch, speed, volume, bitrate, sample rate, and output format.
After submitting a long-text TTS request, please note that the returned audio URL is valid for 24 hours from the time it is generated. Be sure to download the audio within this period.
Best suited for long-form text-to-speech generation, such as entire books. Task queue times may be longer. For short sentence generation, voice chat, or online social scenarios, we recommend using synchronous TTS.
This parameter supports English text normalization, which improves performance in number-reading scenarios, but this comes at the cost of a slight increase in latency. If not provided, the default is false.
Replacement of text, symbols and corresponding pronunciations that require manual handling.
Replace the pronunciation (adjust the tone/replace the pronunciation of other characters) using the following format:
[“omg/oh my god”]
For Chinese texts, tones are replaced by numbers, with 1 for the first tone (high), 2 for the second tone (rising), 3 for the third tone (low/dipping), 4 for the fourth tone (falling), and 5 for the fifth tone (neutral).
The weight, must be provided together with voice_id. Up to 4 timbres can be mixed. The value must be an integer. The higher the proportion for a single timbre, the more the synthesized voice will resemble it.
Enhances recognition of specified minor languages and dialects. Setting this parameter can improve speech performance in the specified language/dialect scenarios. If the minor language type is not clear, you can set it to “auto” and the model will automatically determine the language type. Supported values:
Below is an example of how to use the MiniMax Speech-02-hd asynchronous API.
Generate a task_id by sending a POST request to the MiniMax Speech-02-hd API.
Request:
Copy
Ask AI
curl \-X POST https://api.novita.ai/v3/async/minimax-speech-02-hd \-H "Authorization: Bearer $your_api_key" \-H "Content-Type: application/json" \-d '{ "text": "Audio generation technology is evolving rapidly, enabling the creation of speech, music, and sound effects from text or data inputs. It supports applications in media, accessibility, customer service, and content creation. With improved quality and customization, these tools are increasingly integrated into digital platforms across various industries.", "voice_setting": { "speed": 1.1, "voice_id": "Wise_Woman", "emotion": "happy" }}'
Response:
Copy
Ask AI
{ "task_id": "{Returned Task ID}"}
Use the task_id to retrieve the output audio.
HTTP status codes in the 2xx range indicate the request was successfully accepted, while codes in the 5xx range indicate an internal server error.
You can obtain the audio file from the audio_url field in the audios section of the response.