NVIDIA Lyra 2.0: Open Source 3D World Generation from a Single Image

April 15, 2026

Share this post:

NVIDIA Lyra 2.0: Open Source 3D World Generation from a Single Image

NVIDIA's Spatial Intelligence Lab released Lyra 2.0, an open source framework that converts a single photograph into a navigable, geometrically consistent 3D world. The system is licensed under Apache 2.0, permits commercial use, and ships with model weights on Hugging Face and code on GitHub.

Lyra 2.0, NVIDIA Spatial Intelligence Lab

From Image to Interactive World

Lyra 2.0 takes one input image, generates a camera controlled video walkthrough of the scene, and reconstructs that walkthrough into 3D Gaussian Splats and surface meshes. Those outputs load directly into real time rendering engines and physics simulators.

The system is built on Wan 2.1-14B, a Diffusion Transformer trained for video generation. It produces frames at 832×480 resolution using 35 denoising steps. A distilled variant reduces that to 4 steps for faster iteration without retraining.

Lyra 1.0, released in September 2025, introduced the core pipeline for 3D and 4D scene generation from single images. Lyra 2.0 extends that foundation to long horizon exploration, letting users navigate through large spatial regions while the model maintains geometric consistency across the entire sequence. Other open source world simulators approaching this problem include Lingbot World, which builds on Wan2.2 for camera controlled environment generation under the same Apache 2.0 terms.

Keeping the World Consistent

Two failure modes undermine long horizon 3D generation. Spatial forgetting occurs when the model revisits a distant region and hallucinates geometry rather than reproducing what it already generated. Temporal drifting happens when small per frame errors accumulate until the scene diverges from the original input.

Lyra 2.0 addresses spatial forgetting through geometry based frame retrieval. Instead of feeding historical frames directly as image conditioning, the system routes information exclusively through 3D geometry. It retrieves historical frames with maximum visibility to the target viewpoint and establishes dense correspondences via canonical coordinate warping.

Temporal drifting is reduced through self augmentation training, where the model trains on its own degraded outputs. This closes the gap between training conditions and inference behavior. FramePack temporal compression runs alongside this strategy to limit how far reconstruction errors propagate across extended sequences.

Diverse Scene Generation

Lyra 2.0 handles a wide range of environment types from single input images, including fantasy interiors, futuristic urban landscapes, outdoor terrain, and architectural spaces.

Surreal Eastern fantasy

Futuristic street terrace

Scene example

On the DL3DV and Tanks and Temples benchmarks, Lyra 2.0 scores an LPIPS of 0.552, an FID of 51.33, and a style consistency of 85.07%. These figures measure perceptual quality, distribution fidelity, and visual coherence across generated frames.

Creating Simulation Ready 3D Environments with Interactive GUI

Lyra 2.0 ships an interactive GUI for scene exploration and camera trajectory planning. Users draw paths through the generated environment, and the model progressively extends the scene as the virtual camera moves forward.

Lyra 2.0 in NVIDIA Isaac Sim, delivery robot navigation

The GUI exports finished scenes directly to NVIDIA Isaac Sim for embodied AI and robotics training. A delivery robot navigating a new facility can be trained in a simulated version of that facility, built in minutes from a single photograph. The 3D Gaussian Splat export preserves full spatial geometry and texture detail for physics engine use.

Robot environment

Chinese temple

Staircase environment

Coastal scene

For filmmakers, the GUI offers a faster path to spatial previsualization than manual 3D modeling. Generate a scene from a location photograph, walk through it to plan shots, then export to a game engine. Explore more ways to generate and manipulate scenes in the AI FILMS Studio video workspace.

Open Source and Available for Commercial Use

Lyra 2.0 is released under the Apache 2.0 license, which permits commercial use, modification, and redistribution. Model weights are available on Hugging Face at nvidia/Lyra-2.0. Source code is on GitHub at nv-tlabs/lyra, with 927 stars and 55 forks at publication.

No managed inference endpoint exists at launch. Teams run the model locally on their own hardware. The Wan 2.1-14B base requires a GPU with sufficient VRAM for 14 billion parameter inference. NVIDIA LongLive, the lab's earlier release for real time interactive long video generation, shipped under non commercial terms. Lyra 2.0's Apache 2.0 license removes that restriction for commercial production pipelines. NVIDIA's SANA-WM, released in May 2026, extends the SANA model family to 60-second 720p world modeling under the same Apache 2.0 terms. NVIDIA followed Lyra 2 with Cosmos 3, a much larger open world foundation model released June 2026 under the OpenMDW1.1 license with commercial use permitted, available in 16B and 65B variants.

For post production lighting control on existing footage, UniRelight is a separate NVIDIA Toronto AI Lab release that relights video without regenerating scenes, under noncommercial terms.

Image to 3D approaches are converging across labs. Tencent's HunyuanWorld Mirror covers similar territory using Gaussian splatting for scene representation. Lyra 2.0 distinguishes itself with the anti forgetting and anti drifting mechanisms that sustain consistency across longer trajectories and with the Isaac Sim pipeline for simulation use.

AI FILMS Studio video generation workspace

Try AI FILMS Studio

Generate text-to-video and image-to-video with the latest AI models in the video workspace.

Nodes Graph Editor

Build custom AI workflows by connecting models visually in the Nodes Graph Editor.

Sources

arXiv: Lyra 2.0: Explorable Generative 3D Worlds

GitHub: nv-tlabs/lyra

Hugging Face: nvidia/Lyra-2.0

Project Page: research.nvidia.com/labs/sil/projects/lyra2

Continue Reading

Jul 16, 2026

Millennium Media's Jonathan Yunger Spent 15 Years Making Action Films. Now He's Building an AI Production Suite.

Millennium Media president Jonathan Yunger built Arcana Labs after 15 years producing The Expendables and Rambo. His $50K Echo Hunter secured a SAG-AFTRA contract.

Jul 16, 2026

Hollywood Veterans From Sharknado and Doctor Who Are Now Making AI Films

Promise AI pairs Doctor Who director Jamie Magnus Stone and Sharknado producer Micho Rutare with AI filmmakers to produce original features.

Jul 16, 2026

SAM-MT Achieves 36 FPS Multi-Target Video Segmentation With 20 Subjects on One GPU

SAM-MT from Fudan University extends Meta's SAM2 to track 20 targets at 36 FPS on a single RTX A6000, with direct applications for rotoscoping and VFX compositing.

View all Posts