Fish Audio Text to Speech
Audio
Fish Audio Text to Speech
POST
Fish Audio Text to Speech
Documentation Index
Fetch the complete documentation index at: https://novita.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
For best results, upload reference audio using the create model before using this one. This improves speech quality and reduces latency.
-
WAV / PCM
- Sample Rate: 8kHz, 16kHz, 24kHz, 32kHz, 44.1kHz
- Default Sample Rate: 44.1kHz
- 16-bit, mono
-
MP3
- Sample Rate: 32kHz, 44.1kHz
- Default Sample Rate: 44.1kHz
- mono
- Bitrate: 64kbps, 128kbps (default), 192kbps
-
Opus
- Sample Rate: 48kHz
- Default Sample Rate: 48kHz
- mono
- Bitrate: -1000 (auto), 24kbps, 32kbps (default), 48kbps, 64kbps
Request Headers
Enum:
application/jsonBearer authentication format, for example: Bearer {{API Key}}.
Specify which TTS model to use. Only supports model:
s1.Request Body
Text to be converted to speech.
Controls randomness in the speech generation. Higher values (e.g., 1.0) make the output more random, while lower values (e.g., 0.1) make it more deterministic. We recommend
0.9 for s1 model.Required range: 0 <= x <= 1Controls diversity via nucleus sampling. Lower values (e.g., 0.1) make the output more focused, while higher values (e.g., 1.0) allow more diversity. We recommend
0.9 for s1 model.Required range: 0 <= x <= 1References to be used for the speech, this requires MessagePack serialization, this will override reference_voices and reference_texts.
ID of the reference model to be used for the speech.
Prosody to be used for the speech.
Chunk length to be used for the speech.Required range:
100 <= x <= 300Whether to normalize the speech, this will reduce the latency but may reduce performance on numbers and dates.
Format to be used for the speech.Available options:
wav, pcm, mp3, opusSample rate to be used for the speech.
MP3 Bitrate to be used for the speech.Available options:
64, 128, 192Opus Bitrate to be used for the speech.Available options:
-1000, 24, 32, 48, 64Latency to be used for the speech, balanced will reduce the latency but may lead to performance degradation.Available options:
normal, balancedResponse
The API will directly return the audio stream in the format specified by theformat parameter (default: mp3).