Wan 2.7 Image-to-Video model with multimodal input support (text/image/audio/video). Capable of first-frame-to-video, first+last-frame-to-video, and video continuation. Supports 720P and 1080P resolutions, duration 2~15 seconds, billed per second. Output includes audio by default.
This is an asynchronous API; only the task_id will be returned. You should use the task_id to request the Task Result API to retrieve the video generation results.
Supports: application/json
Bearer authentication format, for example: Bearer {{API Key}}.
Request Body
Random seed for improving reproducibility. Range: [0, 2147483647].Value range: [0, 2147483647]
Text prompt describing desired video content. Supports Chinese and English, max 5000 characters.Length limit: 0 - 5000
Video duration in seconds, billed per second. Range: [2, 15] integer.Value range: [2, 15]
First frame image URL. Supported formats: JPEG, JPG, PNG (no transparency), BMP, WEBP. Resolution: width and height in [240, 8000] pixels, aspect ratio 1:8~8:1, max 20MB. Mutually exclusive with first_clip_url, at least one must be provided.
Add watermark to the output video (bottom-right corner).
Output video resolution tier, affects pricing. Video aspect ratio matches input media.Optional values: 720P, 1080P
Enable intelligent prompt rewriting using LLM. Improves generation quality for short prompts but increases processing time.
First video clip URL for video continuation. The model extends the video based on its content. Supported formats: mp4, mov. Duration: 210s, resolution: width and height in [240, 4096] pixels, aspect ratio 1:88:1, max 100MB. Mutually exclusive with image_url.
Last frame image URL. Combined with first frame to generate first+last frame video. Same format restrictions as first frame.
Negative prompt describing undesired content in the video. Supports Chinese and English, max 500 characters.Length limit: 0 - 500
Driving audio URL. When provided, the model uses it as the driving source for video generation (e.g., lip sync, motion beats). When omitted, the model auto-generates matching background music or sound effects. Supported formats: wav, mp3. Duration: 2~30s, max 15MB.
Response
Use the task_id to request the Task Result API to retrieve the generated outputs.