Skip to main content
POST
/
v3
/
moss-tts
/
v1
/
audio
/
speech
MOSS TTS v1.5
curl --request POST \
  --url https://api.novita.ai/v3/moss-tts/v1/audio/speech \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: <content-type>'
MOSS TTS v1.5 text-to-speech API. Supports both JSON body and multipart (reference audio) requests; returns a complete WAV file or a streaming PCM audio binary. The response is a binary audio stream. When testing with curl, add —output to save it to a file (e.g. —output moss-tts-local.wav), otherwise the binary content prints directly to the terminal.

Request Headers

Content-Type
string
required
Supports: application/json, multipart/form-data
Authorization
string
required
Bearer authentication format, for example: Bearer {{API Key}}.

Request Body

input
string
required
Required, the text to synthesize. Submitting a complete sentence or paragraph at once is recommended.
model
string
default:"MOSS-TTS"
required
Required, fixed value MOSS-TTS to select MOSS TTS v1.5.Optional values: MOSS-TTS
stream
boolean
default:false
Optional, false returns a complete WAV; true returns a PCM stream suitable for play-while-generating.
response_format
string
default:"wav"
Optional, use wav for non-streaming; must be pcm when stream=true.Optional values: wav, pcm

Response

Returns audio binary on success. Non-streaming is a complete WAV; streaming is raw PCM chunks. Streaming PCM format is described by response headers (defaults to 48000Hz, mono, 16-bit little-endian). Format: binary
Last modified on June 26, 2026