# Novita AI MiniMax Speech 2.6 HD Preview ASYNC API | High-Definition Text-to-Speech

> For the complete documentation index, see [llms.txt](/llms.txt). Markdown is available with `Accept: text/markdown` and `.md` URL variants.

Source: /docs/api-reference/model-apis-minimax-speech-2.6-hd-async

# MiniMax Speech-2.6-hd Async Long TTS

POST

/

v3

/

async

/

minimax-speech-2.6-hd

Try it

MiniMax Speech-2.6-hd Async Long TTS

cURL

```
curl --request POST \
--url https://api.novita.ai/v3/async/minimax-speech-2.6-hd \
--header 'Authorization: &#x3C;authorization>' \
--header 'Content-Type: &#x3C;content-type>' \
--data '
{
"text": "&#x3C;string>",
"voice_setting": {
"speed": 123,
"vol": 123,
"pitch": 123,
"voice_id": "&#x3C;string>",
"emotion": "&#x3C;string>",
"text_normalization": true
},
"audio_setting": {
"sample_rate": 123,
"bitrate": 123,
"format": "&#x3C;string>",
"channel": 123
},
"pronunciation_dict": {
"tone": [
{}
]
},
"language_boost": "&#x3C;string>",
"voice_modify": {
"pitch": 123,
"intensity": 123,
"timbre": 123,
"sound_effects": "&#x3C;string>"
}
}
'
```

200

```
{
"task_id": "&#x3C;string>"
}
```

MiniMax’s high-definition text-to-speech model, Compared to Speech 02 released in May, Speech 2.6 has three major breakthroughs: stronger multilingual expressiveness, more accurate voice replication, and broader coverage with 40 languages.

Best suited for long-form text-to-speech generation, such as entire books. Task queue times may be longer. For short sentence generation, voice chat, or online social scenarios, we recommend using [synchronous TTS](/docs/api-reference/model-apis-minimax-speech-2.6-hd).

##

[​](#request-headers)

Request Headers

[​](#param-content-type)

Content-Type

string

required

Supports: `application/json`

[​](#param-authorization)

Authorization

string

required

Bearer authentication format, for example: Bearer {{API Key}}.

##

[​](#request-body)

Request Body

[​](#param-text)

text

string

required

The text to be synthesized. Maximum length: 50,000 characters.

[​](#param-voice-setting)

voice_setting

object

required

Show properties

[​](#param-speed)

speed

number

Range: [0.5, 2], default is 1.0.Controls the speech rate of the generated audio. Optional. Higher values result in faster speech.

[​](#param-vol)

vol

number

Range: (0, 10], default is 1.0.Controls the volume of the generated audio. Optional. Higher values result in louder audio.

[​](#param-pitch)

pitch

number

default:0

Range: [-12, 12], default is 0.Controls the pitch of the generated audio. Optional. 0 means original timbre. The value must be an integer.

[​](#param-voice-id)

voice_id

string

The requested voice ID.Supports both system voices (ID) and cloned voices (ID). The available system voice IDs are as follows:

- Wise_Woman

- Friendly_Person

- Inspirational_girl

- Deep_Voice_Man

- Calm_Woman

- Casual_Guy

- Lively_Girl

- Patient_Man

- Young_Knight

- Determined_Man

- Lovely_Girl

- Decent_Boy

- Imposing_Manner

- Elegant_Man

- Abbess

- Sweet_Girl_2

- Exuberant_Girl

[​](#param-emotion)

emotion

string

Controls the emotion of the synthesized speech.Currently supports 7 emotions: happy, sad, angry, fearful, disgusted, surprised, neutral.Allowed values: `["happy", "sad", "angry", "fearful", "disgusted", "surprised", "neutral"]`

[​](#param-text-normalization)

text_normalization

bool

default:"false"

This parameter supports English text normalization, which improves performance in number-reading scenarios, but this comes at the cost of a slight increase in latency. If not provided, the default is false.

[​](#param-audio-setting)

audio_setting

object

Show properties

[​](#param-sample-rate)

sample_rate

number

default:32000

Range: [8000, 16000, 22050, 24000, 32000, 44100]The sample rate of the generated audio. Optional, default is 32000.

[​](#param-bitrate)

bitrate

number

default:128000

Range: [32000, 64000, 128000, 256000]The bitrate of the generated audio. Optional, default is 128000. This parameter only applies to mp3 audio format.

[​](#param-format)

format

string

default:"mp3"

The audio format of the output. Default is mp3. Options: `mp3`, `pcm`, `flac`, `wav`. wav is only supported for non-streaming output.

[​](#param-channel)

channel

number

default:1

Number of audio channels. Default is 1 (mono). Options:1: mono2: stereo

[​](#param-pronunciation-dict)

pronunciation_dict

object

Show properties

[​](#param-tone)

tone

list

Replacement of text, symbols and corresponding pronunciations that require manual handling.
Replace the pronunciation (adjust the tone/replace the pronunciation of other characters) using the following format:[“omg/oh my god”]For Chinese texts, tones are replaced by numbers, with 1 for the first tone (high), 2 for the second tone (rising), 3 for the third tone (low/dipping), 4 for the fourth tone (falling), and 5 for the fifth tone (neutral).

[​](#param-language-boost)

language_boost

string

default:"null"

Enhances recognition of specified minor languages and dialects. Setting this parameter can improve speech performance in the specified language/dialect scenarios. If the minor language type is not clear, you can set it to “auto” and the model will automatically determine the language type. Supported values:`'Chinese', 'Chinese,Yue', 'English', 'Arabic', 'Russian', 'Spanish', 'French', 'Portuguese', 'German', 'Turkish', 'Dutch', 'Ukrainian', 'Vietnamese', 'Indonesian', 'Japanese', 'Italian', 'Korean', 'Thai', 'Polish', 'Romanian', 'Greek', 'Czech', 'Finnish', 'Hindi', 'Bulgarian', 'Danish', 'Hebrew', 'Malay', 'Persian', 'Slovak', 'Swedish', 'Croatian', 'Filipino', 'Hungarian', 'Norwegian', 'Slovenian', 'Catalan', 'Nynorsk', 'Tamil', 'Afrikaans', 'auto'`

[​](#param-voice-modify)

voice_modify

object

Voice FX settings. Supported audio formats for this parameter: mp3, wav, flac

Show properties

[​](#param-pitch-1)

pitch

integer

Pitch adjustment (darker/brighter). Range: [-100, 100]. Values closer to -100 generate a deeper (darker) voice; values closer to 100 produce a brighter voice.

[​](#param-intensity)

intensity

integer

Intensity adjustment (powerful/soft). Range: [-100, 100]. Values closer to -100 generate a more powerful sound; values closer to 100 result in a softer sound.

[​](#param-timbre)

timbre

integer

Timbre adjustment (magnetic/crisp). Range: [-100, 100]. Values closer to -100 make the voice more full/magnetic; values closer to 100 make it crisper.

[​](#param-sound-effects)

sound_effects

string

Sound effect setting (only one may be selected per request). Valid values:

- `spacious_echo` (large space echo)

- `auditorium_echo` (auditorium broadcast)

- `lofi_telephone` (telephone distortion)

- `robotic` (robotic effect)

##

[​](#response)

Response

[​](#param-task-id)

task_id

string

Use the task_id to request the [Task Result API](/docs/api-reference/model-apis-task-result) to retrieve the generated outputs.

Last modified on February 5, 2026
