EditorNodesPricingBlog

InSpatio-World: Open Source 4D World Model From Video

March 18, 2026
InSpatio-World: Open Source 4D World Model From Video

Still from demo video by InSpatio-World

Share this post:

InSpatio-World: Open Source 4D World Model From Video

InSpatio-World is the first open source real time 4D world model that takes a video as input and turns it into a dynamic, navigable world. You can roam freely across viewpoints, control time forward and backward, and build on top of it.

InSpatio-World 4D world model demo showing navigable real time scene from a reference video
Still from demo video by InSpatio-World

What It Does

Existing 2D and static models capture a single perspective of a scene. The physical world has three spatial dimensions and one time dimension. InSpatio-World models all four, conditioning the output on a reference video and letting you explore the resulting world interactively.

The 1.3 billion parameter model runs at 24 FPS on a single GPU. It ranks first among all real time methods on the WorldScore-Dynamic leaderboard, a benchmark that evaluates 3D, 4D, and video generation systems on controllability, visual quality, and dynamic consistency.

Four Core Capabilities

Free Spatial Roaming

Immerse yourself in the scene and experience the same event from diverse vantage points.

Free spatial roaming across different vantage points of the same scene

Temporal Control

Pause, slow down, or even reverse time to re-experience captured moments.

Temporal control: pause, reverse, or slow down time within the world

Physical Realism

Drawing from the natural dynamics of the reference video, the model preserves physically consistent and realistic dynamics.

Physical realism: consistent dynamics derived from the reference video

Long-Horizon Stability

Even under extended exploration, the world remains anchored to the reference video, preventing drift and preserving consistency with the source scene.

Long horizon stability: the generated world stays consistent over extended exploration

The Technical Problem It Solves

Generative video models simulate pixels rather than persistent worlds. That leads to three failures: physical inconsistency (objects interpenetrate or float), spatial fragility (objects outside the frame become unstable), and temporal drift (world state degrades over long sequences).

InSpatio-World addresses all three with State-Anchored World Modeling. The reference video is anchored as a viewpoint-independent Local World State. All generated observations are sampled from this state rather than from frame history alone. Three components implement this:

  • World State Anchoring: builds a persistent world state ensuring spatial and physical constancy
  • Spatiotemporal Autoregression: performs precise sampling conditioned on the reference video, enabling free navigation across viewpoints and time
  • Joint Distribution Matching Distillation: balances real world fidelity with synthetic controllability, enabling stable generalization under user interaction

The result is spatiotemporally consistent sampling that mitigates long term drift. The model knows where it is in space and time at every step.

For Filmmakers

The combination of spatial roaming and temporal control has direct applications in previsualization and shot planning. A director can feed existing reference footage, explore the resulting world from any angle, and then scrub backward and forward in time to identify the best moment and framing for a shot.

This is a fundamentally different workflow from standard video generation. Instead of generating a clip and evaluating it, you inhabit the scene and direct from inside it. The model keeps the world stable so creative decisions made at one timestamp or viewpoint remain valid across the exploration.

Teams working with world simulators for virtual production can compare InSpatio-World against other open source approaches. Lingbot World from Ant Group focuses on camera pose control for cinematographic applications. MIND benchmark documents the six specific challenges current systems face. InSpatio-World takes a different angle: it starts from a reference video rather than generating from scratch, which anchors the world to real captured dynamics rather than purely synthesized ones.

For reference video generation, AI FILMS Studio's video workspace lets you generate the source footage that InSpatio-World can then transform into a navigable 4D world.


Sources

InSpatio-World Project Page https://inspatio.github.io/inspatio-world/

InSpatio-World GitHub Repository https://github.com/inspatio/inspatio-world