GLM Text to Speech

curl --request POST \
  --url https://api.novita.ai/v3/glm-tts \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: <content-type>' \
  --data '
{
  "input": "<string>",
  "speed": 123,
  "voice": "<string>",
  "volume": 123,
  "response_format": "<string>",
  "watermark_enabled": true
}
'

POST

glm-tts

GLM Text to Speech

curl --request POST \
  --url https://api.novita.ai/v3/glm-tts \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: <content-type>' \
  --data '
{
  "input": "<string>",
  "speed": 123,
  "voice": "<string>",
  "volume": 123,
  "response_format": "<string>",
  "watermark_enabled": true
}
'

Convert text to natural speech using GLM-TTS, supporting multiple voices, emotion control, and tone adjustment.

Request Headers

Content-Type

string

required

Supports: application/json

Authorization

string

required

Bearer authentication format, for example: Bearer {{API Key}}.

Request Body

input

string

required

The text to convert to speechLength limit: 0 - 1024

speed

number

default:1

Speech speed, default is 1.0, range [0.5, 2]Value range: [0.5, 2]

voice

string

default:"tongtong"

required

The voice to use for audio generation, supporting both system voices and cloned voices. System voices include: tongtong (Tongtong, default voice), chuichui (Chuichui), xiaochen (Xiaochen), jam (Dongdong Zoo jam voice), kazi (Dongdong Zoo kazi voice), douji (Dongdong Zoo douji voice), luodo (Dongdong Zoo luodo voice)

volume

number

default:1

Volume, default is 1.0, range (0, 10]Value range: [0, 10]

response_format

string

default:"pcm"

Audio output format, defaults to pcm formatOptional values: wav, pcm

watermark_enabled

boolean

Controls whether to add watermark when generating AI audio. true: Enables explicit watermark and implicit digital watermark for AI-generated content by default, complying with policy requirements. false: Disables all watermarks, only effective for users who have completed watermark removal action.

Response

Request processed successfully, recommended sample rate is 24000 Format: binary

Text to Speech GLM Audio to Text

⌘I

Overview

Basic

Model APIs

GPUs

GLM Text to Speech

Request Headers

Request Body

Response

Overview

Basic

Model APIs

GPUs

​Request Headers

​Request Body

​Response

Request Headers

Request Body

Response