Best suited for short sentence generation, voice chat, and online social scenarios. Processing is fast, but the text length is limited to less than 10,000 characters. For long-form text, we recommend using asynchronous TTS synthesis.
Request Headers
Supports:
application/json
Bearer authentication format, for example: Bearer {{API Key}}.
Request Body
The text to be synthesized. The length must be less than 10,000 characters. Use line breaks to separate paragraphs.
To control the pause duration between speech segments, insert
Custom speech pauses between text segments are supported, allowing you to control the timing of pauses in the generated audio.
Note: Pause markers must be placed between two segments that can be pronounced, and multiple consecutive pause markers are not allowed.
To control the pause duration between speech segments, insert
<#x#>
between words or sentences, where x
is the pause duration in seconds (supports 0.01–99.99, up to two decimal places).Custom speech pauses between text segments are supported, allowing you to control the timing of pauses in the generated audio.
Note: Pause markers must be placed between two segments that can be pronounced, and multiple consecutive pause markers are not allowed.
Required if voice_id is not provided (choose one of the two).
Whether to enable streaming. Default is false, i.e., streaming is disabled.
Enhances recognition of specified minor languages and dialects. Setting this parameter can improve speech performance in the specified language/dialect scenarios. If the minor language type is not clear, you can set it to “auto” and the model will automatically determine the language type. Supported values:
'Chinese', 'Chinese,Yue', 'English', 'Arabic', 'Russian', 'Spanish', 'French', 'Portuguese', 'German', 'Turkish', 'Dutch', 'Ukrainian', 'Vietnamese', 'Indonesian', 'Japanese', 'Italian', 'Korean', 'Thai', 'Polish', 'Romanian', 'Greek', 'Czech', 'Finnish', 'Hindi', 'auto'
Controls the output format of the result. Optional values are
url
and hex
. The default is hex
. This parameter only takes effect in non-streaming scenarios; in streaming mode, only hex format is supported. The returned URL is valid for 24 hours.Response
The synthesized audio segment, encoded in hex and generated in the format specified by
audio_setting.format
(mp3/pcm/flac). The return format is determined by the output_format
parameter. When stream
is true, only hex format is supported.The current status of the audio stream. Returned only when
stream
is true. 1 indicates synthesis in progress, 2 indicates synthesis completed.