Skip to main content
POST
/
v3
/
async
/
wan2.7-r2v
Wan 2.7 Reference-to-Video
curl --request POST \
  --url https://api.novita.ai/v3/async/wan2.7-r2v \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: <content-type>' \
  --data '
{
  "seed": 123,
  "size": "<string>",
  "audio": true,
  "media": [
    {
      "url": "<string>",
      "type": "<string>",
      "reference_voice": "<string>"
    }
  ],
  "prompt": "<string>",
  "duration": 123,
  "shot_type": "<string>",
  "watermark": true,
  "negative_prompt": "<string>"
}
'
{
  "task_id": "<string>"
}
Wan 2.7 Reference-to-Video model with multimodal input support (text/image/video). Can generate single-character performance or multi-character interaction videos using reference characters. Supports intelligent multi-shot generation. 720P and 1080P resolutions, duration 2~10 seconds, billed per second. Output includes audio by default.
This is an asynchronous API; only the task_id will be returned. You should use the task_id to request the Task Result API to retrieve the video generation results.

Request Headers

Content-Type
string
required
Supports: application/json
Authorization
string
required
Bearer authentication format, for example: Bearer {{API Key}}.

Request Body

seed
integer
Random seed for improving reproducibility. Range: [0, 2147483647].Value range: [0, 2147483647]
size
string
default:"1920*1080"
Output video resolution (widthheight), affects pricing. 720P: 1280720 (16:9), 7201280 (9:16), 960960 (1:1), 1088832 (4:3), 8321088 (3:4). 1080P: 19201080 (16:9), 10801920 (9:16), 14401440 (1:1), 16321248 (4:3), 1248*1632 (3:4).Optional values: 1280*720, 720*1280, 960*960, 1088*832, 832*1088, 1920*1080, 1080*1920, 1440*1440, 1632*1248, 1248*1632
audio
boolean
default:true
Whether to generate audio in the video, affects pricing. Default true (with audio).
media
array
required
Reference media array for character appearance, motion and voice extraction. Items map to character1, character2 etc. in order. Images: 0-5, Videos: 0-3, Total <= 5. Image formats: JPEG, JPG, PNG, BMP, WEBP, resolution [240,8000]px, max 10MB. Video formats: MP4, MOV, duration 1-30s, max 100MB. Audio formats: MP3, WAV, FLAC, duration 3-30s.Array length: 1 - 5
prompt
string
required
Text prompt describing desired video content. Use character1, character2 etc. to reference characters from media array in order. Each reference (video or image) contains a single character. Supports Chinese and English, max 1500 characters.Length limit: 0 - 1500
duration
integer
default:5
Video duration in seconds, billed per second. Range: [2, 10] integer.Value range: [2, 10]
shot_type
string
default:"single"
Shot type. single = one continuous shot (default), multi = multiple shots with transitions. Takes priority over prompt.Optional values: single, multi
watermark
boolean
default:false
Add watermark to the output video (bottom-right corner).
negative_prompt
string
Negative prompt describing undesired content in the video. Supports Chinese and English, max 500 characters.Length limit: 0 - 500

Response

task_id
string
Use the task_id to request the Task Result API to retrieve the generated outputs.