# TTA Speech 02 HD ASYNC API | MiniMax Text to Speech

> For the complete documentation index, see [llms.txt](/llms.txt). Markdown is available with `Accept: text/markdown` and `.md` URL variants.

Source: /docs/api-reference/model-apis-minimax-speech-02-hd-async

# MiniMax Speech-02-hd Async Long TTS

POST

/

v3

/

async

/

minimax-speech-02-hd

Try it

MiniMax Speech-02-hd Async Long TTS

cURL

```
curl --request POST \
--url https://api.novita.ai/v3/async/minimax-speech-02-hd \
--header 'Authorization: &#x3C;authorization>' \
--header 'Content-Type: &#x3C;content-type>' \
--data '
{
"text": "&#x3C;string>",
"voice_setting": {
"speed": 123,
"vol": 123,
"pitch": 123,
"voice_id": "&#x3C;string>",
"emotion": "&#x3C;string>",
"text_normalization": true
},
"audio_setting": {
"sample_rate": 123,
"bitrate": 123,
"format": "&#x3C;string>",
"channel": 123
},
"pronunciation_dict": {
"tone": [
{}
]
},
"language_boost": "&#x3C;string>",
"voice_modify": {
"pitch": 123,
"intensity": 123,
"timbre": 123,
"sound_effects": "&#x3C;string>"
}
}
'
```

200

```
{
"task_id": "&#x3C;string>"
}
```

This API supports asynchronous text-to-speech (TTS) generation, with a maximum limit of 1 million characters per request for text input. The resulting audio can be retrieved asynchronously. Over 100 system and cloned voices are available, with customizable parameters including pitch, speed, volume, bitrate, sample rate, and output format.
After submitting a long-text TTS request, please note that the returned audio URL is valid for 24 hours from the time it is generated. Be sure to download the audio within this period.

Best suited for long-form text-to-speech generation, such as entire books. Task queue times may be longer. For short sentence generation, voice chat, or online social scenarios, we recommend using [synchronous TTS](/docs/api-reference/model-apis-minimax-speech-02-hd).

##

[​](#request-headers)

Request Headers

[​](#param-content-type)

Content-Type

string

required

Enum: `application/json`

[​](#param-authorization)

Authorization

string

required

Bearer authentication format, for example: Bearer {{API Key}}.

##

[​](#request-body)

Request Body

[​](#param-text)

text

string

required

The text to be synthesized. Maximum length: 50,000 characters.

[​](#param-voice-setting)

voice_setting

object

required

Show properties

[​](#param-speed)

speed

number

Range: [0.5, 2], default is 1.0.Controls the speech rate of the generated audio. Optional. Higher values result in faster speech.

[​](#param-vol)

vol

number

Range: (0, 10], default is 1.0.Controls the volume of the generated audio. Optional. Higher values result in louder audio.

[​](#param-pitch)

pitch

number

default:0

Range: [-12, 12], default is 0.Controls the pitch of the generated audio. Optional. 0 means original timbre. The value must be an integer.

[​](#param-voice-id)

voice_id

string

The requested voice ID.Supports both system voices (ID) and cloned voices (ID). The available system voice IDs are as follows:

- Wise_Woman

- Friendly_Person

- Inspirational_girl

- Deep_Voice_Man

- Calm_Woman

- Casual_Guy

- Lively_Girl

- Patient_Man

- Young_Knight

- Determined_Man

- Lovely_Girl

- Decent_Boy

- Imposing_Manner

- Elegant_Man

- Abbess

- Sweet_Girl_2

- Exuberant_Girl

[​](#param-emotion)

emotion

string

Controls the emotion of the synthesized speech.Currently supports 7 emotions: happy, sad, angry, fearful, disgusted, surprised, neutral.Allowed values: `["happy", "sad", "angry", "fearful", "disgusted", "surprised", "neutral"]`

[​](#param-text-normalization)

text_normalization

bool

default:"false"

This parameter supports English text normalization, which improves performance in number-reading scenarios, but this comes at the cost of a slight increase in latency. If not provided, the default is false.

[​](#param-audio-setting)

audio_setting

object

Show properties

[​](#param-sample-rate)

sample_rate

number

default:32000

Range: [8000, 16000, 22050, 24000, 32000, 44100]The sample rate of the generated audio. Optional, default is 32000.

[​](#param-bitrate)

bitrate

number

default:128000

Range: [32000, 64000, 128000, 256000]The bitrate of the generated audio. Optional, default is 128000. This parameter only applies to mp3 audio format.

[​](#param-format)

format

string

default:"mp3"

The audio format of the output. Default is mp3. Options: `mp3`, `pcm`, `flac`, `wav`. wav is only supported for non-streaming output.

[​](#param-channel)

channel

number

default:1

Number of audio channels. Default is 1 (mono). Options:1: mono2: stereo

[​](#param-pronunciation-dict)

pronunciation_dict

object

Show properties

[​](#param-tone)

tone

list

Replacement of text, symbols and corresponding pronunciations that require manual handling.
Replace the pronunciation (adjust the tone/replace the pronunciation of other characters) using the following format:[“omg/oh my god”]For Chinese texts, tones are replaced by numbers, with 1 for the first tone (high), 2 for the second tone (rising), 3 for the third tone (low/dipping), 4 for the fourth tone (falling), and 5 for the fifth tone (neutral).

[​](#param-language-boost)

language_boost

string

default:"null"

Enhances recognition of specified minor languages and dialects. Setting this parameter can improve speech performance in the specified language/dialect scenarios. If the minor language type is not clear, you can set it to “auto” and the model will automatically determine the language type. Supported values:`'Chinese', 'Chinese,Yue', 'English', 'Arabic', 'Russian', 'Spanish', 'French', 'Portuguese', 'German', 'Turkish', 'Dutch', 'Ukrainian', 'Vietnamese', 'Indonesian', 'Japanese', 'Italian', 'Korean', 'Thai', 'Polish', 'Romanian', 'Greek', 'Czech', 'Finnish', 'Hindi', 'Bulgarian', 'Danish', 'Hebrew', 'Malay', 'Persian', 'Slovak', 'Swedish', 'Croatian', 'Filipino', 'Hungarian', 'Norwegian', 'Slovenian', 'Catalan', 'Nynorsk', 'Tamil', 'Afrikaans', 'auto'`

[​](#param-voice-modify)

voice_modify

object

Voice FX settings. Supported audio formats for this parameter: mp3, wav, flac

Show properties

[​](#param-pitch-1)

pitch

integer

Pitch adjustment (darker/brighter). Range: [-100, 100]. Values closer to -100 generate a deeper (darker) voice; values closer to 100 produce a brighter voice.

[​](#param-intensity)

intensity

integer

Intensity adjustment (powerful/soft). Range: [-100, 100]. Values closer to -100 generate a more powerful sound; values closer to 100 result in a softer sound.

[​](#param-timbre)

timbre

integer

Timbre adjustment (magnetic/crisp). Range: [-100, 100]. Values closer to -100 make the voice more full/magnetic; values closer to 100 make it crisper.

[​](#param-sound-effects)

sound_effects

string

Sound effect setting (only one may be selected per request). Valid values:

- `spacious_echo` (large space echo)

- `auditorium_echo` (auditorium broadcast)

- `lofi_telephone` (telephone distortion)

- `robotic` (robotic effect)

##

[​](#response)

Response

[​](#param-task-id)

task_id

string

Use the task_id to request the [Task Result API](/docs/api-reference/model-apis-task-result) to retrieve the generated outputs.

##

[​](#example)

Example

Below is an example of how to use the MiniMax Speech-02-hd asynchronous API.

- Generate a `task_id` by sending a POST request to the MiniMax Speech-02-hd API.

`Request:`

```
curl \
-X POST https://api.novita.ai/v3/async/minimax-speech-02-hd \
-H "Authorization: Bearer $your_api_key" \
-H "Content-Type: application/json" \
-d '{
"text": "Audio generation technology is evolving rapidly, enabling the creation of speech, music, and sound effects from text or data inputs. It supports applications in media, accessibility, customer service, and content creation. With improved quality and customization, these tools are increasingly integrated into digital platforms across various industries.",
"voice_setting": {
"speed": 1.1,
"voice_id": "Wise_Woman",
"emotion": "happy"
}
}'
```

`Response:`

```
{
"task_id": "{Returned Task ID}"
}
```

- Use the task_id to retrieve the output audio.
HTTP status codes in the 2xx range indicate the request was successfully accepted, while codes in the 5xx range indicate an internal server error.
You can obtain the audio file from the audio_url field in the audios section of the response.

`Response:`

```
{
"extra": {},
"task": {
"task_id": "57afda9c-0d75-4f89-ab4e-554f902a5a4e",
"task_type": "MINIMAX_SPEECH_02_HD",
"status": "TASK_STATUS_SUCCEED",
"reason": "",
"eta": 0,
"progress_percent": 0
},
"images": [],
"videos": [],
"audios": [
{
"audio_url": "https://faas-minimax-audio-v2.s3.ap-southeast-1.amazonaws.com/test/289125447749800-225bb323-3def-4b7f-b92f-7eccade04f3d.mp3?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIASVPYCN6LRCW3SOUV%2F20250710%2Fap-southeast-1%2Fs3%2Faws4_request&X-Amz-Date=20250710T101656Z&X-Amz-Expires=86400&X-Amz-SignedHeaders=host&x-id=GetObject&X-Amz-Signature=bce290288e8e2017fd0daec4cca4938c3c72abb9d027482c3d95ca275529a8bb",
"audio_url_ttl": "0",
"audio_type": "mp3",
"audio_metadata": null
}
]
}
```

`Audio file:`

Last modified on February 5, 2026
