DreamX-World 1.0: Open Source Interactive World Generation with Camera Navigation

June 16, 2026

Updated: July 23, 2026

Share this post:

DreamX-World 1.0: Open Source Interactive World Generation with Camera Navigation

AMAP-ML, the computer vision research lab within Alibaba's AutoNavi mapping division, released DreamX-World 1.0 on June 15, 2026. The model generates navigable environments from text prompts and supports six degree of freedom camera control across a wide range of scene types, from photorealistic urban streets to stylized fantasy worlds.

The model is built on a Wan2.2-T2V-5B base (~5 billion parameters) and produces video at 704×1280 resolution for up to 7.5 seconds per generation. It is released under the MIT license on HuggingFace and Apache 2.0 on GitHub, permitting commercial use. The accompanying paper was published to arXiv (2606.16993) on June 15.

Long Horizon Generation and World Memory

DreamX-World generates extended sequences that maintain spatial and temporal consistency across the full duration of navigation. Rather than producing isolated clips, the model tracks previously seen geometry and uses that stored context to keep the world coherent as the camera moves further from its starting point.

The following demo shows a long horizon sequence where the model sustains a consistent environment without visual collapse:

Long horizon video generation with consistent world state

The world memory system stores spatial context from previously generated frames and applies it to maintain visual continuity during navigation. This is what allows the camera to move away from and return to a location without the environment losing coherence:

World memory maintaining scene consistency during camera navigation

The spatial context system is what makes DreamX-World useful for extended previsualization work rather than just isolated clips. A director can navigate through a generated location, stop at a promising angle, return to an earlier viewpoint to check continuity, and the world holds. That kind of back and forth exploration is not possible in standard video generation.

The "long horizon" framing is specific. Most open source world models produce reliable output for a few seconds before the scene degrades. DreamX-World's memory architecture extends this across the full 7.5-second generation window and across multiple consecutive generations, allowing spatial exploration rather than single clip sampling.

Real Worlds and Dream Worlds

DreamX-World generates across two distinct mode categories. In its realistic mode, it produces photographic environments: urban streets, forest paths, coastlines, and architectural interiors. In its stylized mode, it generates fantasy, sci-fi, and painterly worlds that would require extensive VFX or set construction to replicate on a physical production.

The two demos below show this range side by side:

Realistic environment navigation

Stylized dream world navigation

Third Person View

Beyond first person navigation, DreamX-World supports third person perspective generation. This mode places a figure or subject within the frame and generates the environment around them, with the camera following or orbiting rather than inhabiting the scene directly.

Third person view generation

Third person mode extends the model's value for character centric previsualization. Instead of scouting empty environments, you can generate a scene with a figure already present and test how camera placement reads against both the subject and the world around them.

The character tracking in third person mode connects DreamX-World to previsualization workflows that require both a character in frame and a navigable environment behind them. Generating this combination in standard video tools requires compositing multiple separate generations. DreamX-World handles it in a single pass.

This mode is also relevant for game and interactive narrative prototyping, where third person perspective is the dominant camera grammar. The MIT license permits those applications without restriction.

What Filmmakers Can Do With It

The six degree of freedom camera system covers the full range of physical camera moves: dolly forward and back, pan left and right, tilt up and down, and lateral translation. This maps directly to pre-production workflows where directors and cinematographers need to test blocking and camera angles before committing to a real location or set build.

Virtual location scouting is one direct application. Generate and navigate an environment from a text description before visiting the real location or authorizing set construction costs. World building for animation and VFX is another: DreamX-World's stylized modes produce fantasy and sci-fi environments that serve as reference for concept art and production design. The model pairs naturally with camera motion cloning tools like OmniDirector. Use OmniDirector to extract a camera trajectory from a reference clip, then apply that move to a generated environment with DreamX-World.

For pre-visualization at scale, the model's event driven interaction capabilities mean the world can respond to changes mid-navigation: objects move, environments shift, the scene evolves. This makes it relevant not only for film pre-production but for interactive narrative and game world prototyping as well.

The text-to-video base means no source material is required to start a generation. Type a description of a location, choose a starting camera position, and navigate from there. This is substantially different from video to world approaches that require a reference clip before any exploration is possible.

Production design teams can use DreamX-World to build mood boards that are navigable rather than static. Instead of a series of concept images, a designer can generate a walkable version of a proposed environment and move through it before any physical construction decisions are made.

The model's 704×1280 resolution output is wide aspect video, matching the cinematic framing most productions work in during pre-production. Generated clips can feed directly into storyboarding and animatic workflows without resizing.

Generate AI video in AI FILMS Studio to explore how world generation fits into production pipelines.

License and Access

DreamX-World 1.0 is released under MIT license on HuggingFace and Apache 2.0 on GitHub. Both licenses allow commercial use. The weights for two variants are publicly downloadable: the general world generation model (DreamX-World-5B) and the camera-controlled variant (DreamX-World-5B-Cam).

AMAP-ML is the research lab of Amap (AutoNavi), Alibaba's mapping and navigation platform. Their background in three dimensional spatial modeling of real world environments informs DreamX-World's six degree of freedom control accuracy. The lab's core work involves representing real places in spatial data structures, not just generating plausible video. A parallel open source approach to interactive world generation can be found in InSpatio-World, which turns video inputs into navigable 4D scenes. A July 2026 release, AlayaWorld, takes a complementary approach by pairing long horizon consistency and camera control with text prompt driven scene interactions and a stylized game world generation mode, under the same Apache 2.0 license.

The paper's authorship comes from AutoNavi's mapping and location intelligence team, which has published consistently in computer vision. Their spatial modeling work informs the precision of the camera control system. The lab represents real places as three dimensional data structures before generating them, which is a different starting point than purely synthetic approaches.

DreamX-World-5B-Cam, the camera controlled variant, is the model most directly relevant to filmmaking workflows. It accepts explicit camera trajectory inputs, letting the user specify not just where to start but the exact path the camera will take through the generated world.

The GitHub repository includes the full training and inference code. Production teams with the compute capacity to run the 5 billion parameter model locally can adapt the weights for domain specific fine-tuning. The MIT license applies to all modifications and derivatives.

The Apache 2.0 license that governs the GitHub version differs slightly from the HuggingFace MIT license in its attribution requirements. Both permit commercial use. Teams working in environments where license specifics matter should confirm which repository they are pulling from before deploying. A related release from Alibaba Group, Holo-World, extends camera and object control with a weather conditioning system that applies snow, rain, fog, and cloud effects to any generated scene without changing the underlying motion.

The same AMAP CV Lab released ABot-World-0 in July 2026, a 5B action conditioned world model that complements DreamX-World's camera trajectory approach with an action based navigation interface. ABot-World-0 uses LongForcing training for long horizon stability, runs at 720P on a single RTX 5090, and is available under Apache 2.0.

AI FILMS Studio video generation workspace

Try AI FILMS Studio

Generate text-to-video and image-to-video with the latest AI models in the video workspace.

Nodes Graph Editor

Build custom AI workflows by connecting models visually in the Nodes Graph Editor.

Sources

GitHub: AMAP-ML/DreamX-World
HuggingFace: GD-ML/DreamX-World-5B-Cam | GD-ML/DreamX-World-5B
arXiv: DreamX-World: Towards Generalized Controllable World Generation
Project Page: amap-ml.github.io/DreamX_World

Continue Reading

Jul 31, 2026

Andrew Garfield Speaks Out on 'Artificial' and His Desire to Meet Sam Altman

Andrew Garfield speaks publicly for the first time about 'Artificial,' Amazon's OpenAI exit, and wanting to meet Sam Altman before the film releases.

Jul 31, 2026

$2.4M Deal Collapses After AI Authorship Questions Kill 'Call Me, I'll Hide The Body'

A $2.4 million book deal for debut thriller 'Call Me, I'll Hide The Body' collapsed after AI authorship questions, killing film and TV adaptation talks.

Jul 31, 2026

Locarno 79: 'The Counter-Algorithm Is a Human Being Who Gives a Damn'

Locarno 79 opens August 5 with 233 works and a clear thesis: the human programmer is the counter-algorithm, the answer to algorithmic recommendation culture.

View all Posts