- The uploaded audio file must be in mp3, m4a, or wav format.
- The duration of the uploaded audio must be at least 10 seconds and no more than 5 minutes.
- The uploaded audio file size must not exceed 20 MB.
Request Headers
Enum:
application/json
Bearer authentication format, for example: Bearer {{API Key}}.
Request Body
The URL of the audio file to be cloned. Supported formats: mp3, m4a, wav.
clone_prompt
Voice cloning parameters. Providing this parameter can help improve the similarity and stability of the synthesized voice.If this parameter is used, you must also upload a short sample audio (duration less than 8 seconds) and the corresponding transcript. Supported audio formats: mp3, m4a, wav.
Voice cloning parameter. Maximum 200 characters. If this field is provided, the service will compare the audio and the text; if the difference is too large, error code 1043 will be returned.
Voice cloning preview parameter. The model will use the cloned voice to synthesize this text and return the result as an audio URL for preview. Maximum 2000 characters. Note: Preview will be charged according to the number of characters, at the same rate as T2A APIs.
Voice cloning preview parameter. Specifies the speech model to use for preview. Required if the “text” field is provided.
Options:
Options:
speech-02-hd
, speech-02-turbo
, speech-2.5-hd-preview
, speech-2.5-turbo-preview
Voice cloning parameter. Value range: [0, 1]. If provided, sets the text validation accuracy threshold. Default is 0.7 if not specified.
Voice cloning parameter. Whether to enable noise reduction. Defaults to false if not specified.
Voice cloning parameter. Whether to enable volume normalization. Defaults to false if not specified.
Response
If both the preview text (
text
) and preview model (model
) are provided in the request body, this parameter returns the preview audio as a URL.The generated
voice_id
.Example
Below is an example of how to use the Minimax Voice Cloning API to clone a voice.Request:
Response: