For best results, upload reference audio using the create model before using this one. This improves speech quality and reduces latency.
Fish Audio converts text into speech.
Audio formats supported:
WAV / PCM
Sample Rate: 8kHz, 16kHz, 24kHz, 32kHz, 44.1kHz
Default Sample Rate: 44.1kHz
16-bit, mono
MP3
Sample Rate: 32kHz, 44.1kHz
Default Sample Rate: 44.1kHz
mono
Bitrate: 64kbps, 128kbps (default), 192kbps
Opus
Sample Rate: 48kHz
Default Sample Rate: 48kHz
mono
Bitrate: -1000 (auto), 24kbps, 32kbps (default), 48kbps, 64kbps
Bearer authentication format, for example: Bearer {{API Key}}.
Specify which TTS model to use. Only supports model: s1.
Request Body
Text to be converted to speech.
Controls randomness in the speech generation. Higher values (e.g., 1.0) make the output more random, while lower values (e.g., 0.1) make it more deterministic. We recommend 0.9 for s1 model. Required range: 0 <= x <= 1
Controls diversity via nucleus sampling. Lower values (e.g., 0.1) make the output more focused, while higher values (e.g., 1.0) allow more diversity. We recommend 0.9 for s1 model. Required range: 0 <= x <= 1
references
ReferenceAudio · object[] | null
References to be used for the speech, this requires MessagePack serialization, this will override reference_voices and reference_texts. Reference text corresponding to the audio.
ID of the reference model to be used for the speech.
Prosody to be used for the speech.
Chunk length to be used for the speech. Required range: 100 <= x <= 300
Whether to normalize the speech, this will reduce the latency but may reduce performance on numbers and dates.
format
enum<string>
default: "mp3"
Format to be used for the speech. Available options: wav, pcm, mp3, opus
Sample rate to be used for the speech.
MP3 Bitrate to be used for the speech. Available options: 64, 128, 192
Opus Bitrate to be used for the speech. Available options: -1000, 24, 32, 48, 64
latency
enum<string>
default: "normal"
Latency to be used for the speech, balanced will reduce the latency but may lead to performance degradation. Available options: normal, balanced
Response
The API will directly return the audio stream in the format specified by the format parameter (default: mp3).