Point Level Motion Control Finally Reaches Commercial Quality

December 10, 2025

Share this post:

Point Level Motion Control Finally Reaches Commercial Quality

Official Wan-Move demonstration showcasing point-level motion control with latent trajectory guidance

Motion control in video generation has existed for years, but the results rarely matched commercial systems. Wan-Move just changed that equation.

Released by Alibaba's Tongyi Lab and academic partners, this NeurIPS 2025 paper introduces point level motion control that rivals Kling 1.5 Pro's Motion Brush. The system generates 5 second, 480p videos with precise control over how every element moves in the scene.

The breakthrough comes from latent trajectory guidance, which propagates features along dense point trajectories without requiring architecture changes to the base model. This approach enables scalable training and fine grained control that previous methods couldn't achieve.

The Motion Control Problem

Existing motion controllable methods face two core limitations. First, they provide coarse control granularity, making it difficult to specify how individual elements should move. Second, they lack scalability, limiting their practical application to professional video production.

These constraints result in outputs that fall short of commercial quality standards. Users need either complex technical setups or expensive commercial licenses to achieve production ready results.

Wan-Move addresses both issues through dense point trajectories and latent space integration. The system allows fine grained control while building directly on Wan-I2V-14B without auxiliary motion encoders or architectural modifications.

How Latent Trajectory Guidance Works

The core innovation is representing object motions through dense point trajectories projected into latent space. Instead of encoding motion separately, the system propagates first frame features along each trajectory to create an aligned spatiotemporal feature map.

This feature map serves as the updated latent condition, telling the model how each scene element should move. The approach integrates naturally into image-to-video models without architecture changes, making it easily scalable through standard training procedures.

The result is motion guidance that provides both precision and flexibility. Users can specify complex multi-object motions through point trajectories while the model handles temporal consistency and visual quality.

Single Object Motion Control

Input image with trajectory annotation (left) and generated video output (right) demonstrating large-scale single object motion control

Wan-Move handles large scale motions for single objects with precise trajectory following. The system maintains object appearance and scene coherence while executing complex movement patterns across the video duration.

Input image with trajectory annotation (left) and generated video output (right) showing physically plausible motion behavior

The model respects physical constraints during motion execution. Objects move naturally within their environment, maintaining appropriate interactions with surrounding elements and following realistic physics.

Multi Object Motion Control

Input image with multiple trajectory annotations (left) and generated video output (right) demonstrating independent control of multiple objects

Wan-Move extends single object capabilities to multiple objects moving independently. Each object follows its specified trajectory while the system manages interactions and maintains scene consistency.

This multi-object control enables complex choreographed sequences where different elements move at different speeds and directions. The model handles occlusions naturally and preserves spatial relationships between objects throughout the motion.

Motion Transfer Capabilities

Source video with motion pattern (left) and target image with transferred motion (right) showing motion transfer between different scenes

The system supports motion transfer, where movement patterns from one video can be applied to different images. This capability enables reusing complex motion sequences across various content while adapting them to new visual contexts.

Motion transfer maintains the timing and spatial characteristics of the source motion while adapting to the target image's content. The transferred motion respects the new scene's structure and maintains visual consistency.

3D Rotation Control

3D rotation demonstration showing camera perspective change around the scene

Wan-Move handles 3D rotations for camera control and object manipulation. The system maintains perspective consistency and proper depth relationships as the viewpoint changes throughout the video.

Camera control enables cinematic movements like orbits, dolly shots, and perspective shifts. These movements combine with object motion control for sophisticated video generation scenarios.

Performance Validation Through User Studies

User studies demonstrate that Wan-Move's motion controllability matches Kling 1.5 Pro's commercial Motion Brush system. The comparison evaluated motion accuracy, temporal consistency, and overall visual quality across diverse content categories.

The system achieves this performance through scaled training on high quality motion data. By building on Wan-I2V-14B and training with dense trajectory annotations, Wan-Move reaches commercial-grade results without requiring proprietary infrastructure.

AI FILMS Studio video generation workspace

Try AI FILMS Studio

Generate text-to-video and image-to-video with the latest AI models in the video workspace.

Nodes Graph Editor

Build custom AI workflows by connecting models visually in the Nodes Graph Editor.

MoveBench: A Dedicated Motion Control Benchmark

The researchers released MoveBench, a comprehensive benchmark for evaluating motion controllable video generation. The dataset features diverse content categories, longer video durations (5 seconds), and high-quality trajectory annotations verified through hybrid validation.

MoveBench addresses limitations in existing benchmarks by providing larger data volume and more rigorous annotation standards. The benchmark supports both English and Chinese language prompts, enabling cross language evaluation of motion control systems.

The dataset includes single object and multi object motion scenarios with varying complexity levels. Annotations specify exact point trajectories and visibility information, enabling precise evaluation of motion fidelity.

Technical Implementation

Wan-Move builds on the Wan-I2V-14B image-to-video model without requiring architecture modifications. The latent trajectory guidance approach integrates motion control through updated latent conditions rather than separate motion encoding modules.

This design choice enables easy scaling through standard training procedures. The system doesn't need specialized motion encoders or complex multi stage training, reducing technical complexity and computational requirements.

Implementation requires PyTorch 2.4.0 or higher and standard GPU configurations. The model supports both single GPU and multi GPU inference through FSDP and xDiT USP acceleration. Memory constrained setups can use model offloading options.

Availability and Licensing

Wan-Move is available as opensource software under Apache 2.0 license, permitting commercial use. The model weights are hosted on both Hugging Face and ModelScope, with inference code available on GitHub.

The complete package includes the Wan-Move-14B-480P model for 5 second, 480p video generation, MoveBench dataset for evaluation, and visualization tools for trajectory effects. Installation requires minimal dependencies beyond the base Wan2.1 setup.

Users can run inference through single GPU or multi GPU configurations depending on their hardware. The repository provides example cases and evaluation scripts for both single object and multi object motion scenarios.

Comparison With Existing Methods

Qualitative comparisons show Wan-Move competing with both academic methods and commercial solutions. The system demonstrates advantages in motion precision, temporal consistency, and visual quality across varied content types.

Academic methods like Tora provide motion control but often lack the fine grained precision needed for professional applications. Commercial solutions like Kling 1.5 Pro offer high quality but require paid subscriptions and lack code access for research purposes.

Wan-Move bridges this gap by providing commercial quality results through opensource code and freely available model weights. This accessibility enables researchers and developers to build on the system's capabilities.

Research Implications

The latent trajectory guidance approach demonstrates that motion control can be achieved without auxiliary encoders or architecture modifications. This finding suggests simpler paths toward integrating motion capabilities into existing video generation models.

The system's scalability through standard training procedures indicates that commercial quality motion control doesn't require specialized infrastructure. Research teams with standard GPU setups can train and deploy comparable systems.

MoveBench's release provides a standardized evaluation framework for future motion control research. The benchmark's rigorous annotation standards and diverse content categories support reproducible comparisons across methods.

Future Directions

The research team plans to release Gradio demos for interactive motion control exploration. These interfaces will enable users to draw trajectories directly and see generated results without command line interaction.

Current capabilities focus on 5 second, 480p generation. Future work may explore longer durations and higher resolutions while maintaining motion control quality. The scalable architecture suggests these extensions are technically feasible.

Integration with other video editing capabilities could enable comprehensive video production workflows. Combining motion control with style transfer, object editing, and effects processing would expand practical applications. Other models in the Wan family cover related use cases: Wan Alpha generates native RGBA video with transparent backgrounds for compositing workflows, while Wan2.2-Animate maps actor motion onto target characters for animation and replacement.

Paper: arXiv:2512.08765
Project Page: wan-move.github.io
GitHub: github.com/ali-vilab/Wan-Move
Model Weights: huggingface.co/Ruihang/Wan-Move-14B-480P
Benchmark: huggingface.co/datasets/Ruihang/MoveBench

Continue Reading

Jun 18, 2026

'Apocalypse Civilizations: Rome' Uses AI to Reconstruct Battles No Camera Ever Filmed

Mediawan's Apocalypse franchise turns to AI for its first ancient history episode, recreating the Battle of Actium for a 165-country audience.

Jun 18, 2026

Tom Holland on AI and Why Artists Are Safe: 'Creativity Has to Do With the Human Experience'

Tom Holland told Spain's El Hormiguero that AI cannot replace artists because creativity is rooted in human emotion and self-expression.

Jun 17, 2026

Andy Serkis at APOS 2026: AI Brings New Responsibility to Storytelling

Andy Serkis spoke at APOS 2026 in Bali on AI's responsibility in storytelling, while Thomas Tull predicted the largest transfer of wealth in human history from AI.

View all Posts

Speech & Voice

Music & Sound Effects

Point Level Motion Control Finally Reaches Commercial Quality

Point Level Motion Control Finally Reaches Commercial Quality

The Motion Control Problem

How Latent Trajectory Guidance Works

Single Object Motion Control

Multi Object Motion Control

Motion Transfer Capabilities

3D Rotation Control

Performance Validation Through User Studies

MoveBench: A Dedicated Motion Control Benchmark

Technical Implementation

Availability and Licensing

Comparison With Existing Methods

Research Implications

Future Directions

Continue Reading

'Apocalypse Civilizations: Rome' Uses AI to Reconstruct Battles No Camera Ever Filmed

Tom Holland on AI and Why Artists Are Safe: 'Creativity Has to Do With the Human Experience'

Andy Serkis at APOS 2026: AI Brings New Responsibility to Storytelling

Video & LipSync

Image & Edit

Speech & Voice

Music & Sound Effects