Convert text to natural speech using GLM-TTS, supporting multiple voices, emotion control, and tone adjustment.
Supports: application/json
Bearer authentication format, for example: Bearer {{API Key}}.
Request Body
The text to convert to speechLength limit: 0 - 1024
Speech speed, default is 1.0, range [0.5, 2]Value range: [0.5, 2]
voice
string
default:"tongtong"
required
The voice to use for audio generation, supporting both system voices and cloned voices. System voices include: tongtong (Tongtong, default voice), chuichui (Chuichui), xiaochen (Xiaochen), jam (Dongdong Zoo jam voice), kazi (Dongdong Zoo kazi voice), douji (Dongdong Zoo douji voice), luodo (Dongdong Zoo luodo voice)
Volume, default is 1.0, range (0, 10]Value range: [0, 10]
Audio output format, defaults to pcm formatOptional values: wav, pcm
Controls whether to add watermark when generating AI audio. true: Enables explicit watermark and implicit digital watermark for AI-generated content by default, complying with policy requirements. false: Disables all watermarks, only effective for users who have completed watermark removal action.
Response
Request processed successfully, recommended sample rate is 24000
Format: binary