VideoFrom3D: 3D scene video generation from coarse geometry

September 27, 2025

Share this post:

VideoFrom3D | what matters

What it is

VideoFrom3D generates watchable scene videos from the ingredients filmmakers already have on hand: a rough 3D blockout, a camera path that matches your boards, and a single reference image to set look and mood. Instead of asking you to invent motion out of thin air, the system leans on a simple idea. Use an image model to produce high quality anchor views that respect composition and style, then use a video model to fill the in-between frames so the camera move feels continuous. That split keeps detail where it counts and avoids the mushy motion that plagues many one-shot generators. In practice you get a clip that follows your lensing and timing, carries the palette and texture you asked for, and arrives fast enough to keep a meeting moving. It is a tool for scene videos and exploratory cinematography, not a character animation replacement, which makes it a clean fit for previs and pitch work.

Why filmmakers should care

Previs lives on speed and clarity. With VideoFrom3D you can turn a greybox level into a styled moving scene that reflects your camera notes and grading intent, then make decisions with your collaborators while everyone looks at the same picture. That helps directors test blocking and eye trace, helps editors feel pacing before a single day on set, and gives production design a shared target for materials and scale. Because the method respects a long prompt with concrete details, you can describe lens choice, time of day, atmosphere, and typography if you want diegetic text or signage, and those elements persist across the shot. For pitches and lookbooks, this replaces the usual jump from stills to imagination with a short, coherent pass you can actually cut against. The goal is not perfection. It is to remove the dead time between an idea and a clip that communicates that idea to a team.

How it works in plain language

The pipeline has two cooperating pieces. First comes Sparse Anchor-view Generation, which uses an image diffusion model to render a handful of key frames that align across viewpoints and carry your style reference with high fidelity. Those anchors establish the look. Then Geometry-Guided Generative Inbetweening takes over. A video diffusion model, guided by your camera path and by the structure of the coarse geometry, synthesizes the intermediate frames that connect those anchors. Because the camera trajectory is your own, the result reads like a deliberate move rather than a drifting hallucination. This division of labor is why the output feels both sharp and continuous. You are asking each model to do the thing it does best and letting geometry keep them honest. The result is a moving shot that obeys your layout and timing while borrowing richness from a single image.

What it is not

This is not motion capture, pose transfer, or a tool for dialog performance. It is about environments, cameras, and style. If you need character acting, you will still stage that work with the right tools or performers and composite as needed. Think of VideoFrom3D as a fast way to scout virtually, prove a move, or set the tone for a space. In that role it shines, because it removes friction without blurring authorship. You provide the layout and the path. The system gives you a version worth showing, and you can iterate without throwing away your scene structure.

License and practical notes

As of now the public repository does not include a license file, which means no open source grant by default. Treat the code and any linked checkpoints as research artifacts and get explicit permission from the authors before using them in commercial work. For best results bring your own coarse geometry from your DCC or engine, author a camera path that matches the storyboard beats, and use a single strong reference image to define palette and texture. Save prompts, seeds, and commit hashes to make runs reproducible, and expect research grade edges while the implementation evolves.

Sources

Project page: https://kimgeonung.github.io/VideoFrom3D/
Paper: https://arxiv.org/abs/2509.17985
GitHub: https://github.com/KIMGEONUNG/VideoFrom3D
Checkpoint folder: https://drive.google.com/drive/folders/1IhI9qDv6tH5T7XzeEjx27UYw2EqZ7MKY

Continue Reading

Nov 13, 2025

Disney+ Plans AI User-Generated Content Within Disney IP Boundaries

Disney CEO Bob Iger announces plans for AI-powered user-generated content on Disney+ limited to Disney intellectual property. What this means for creators seeking unrestricted storytelling tools.

Nov 12, 2025

Creator of AI Actress Tilly Norwood Plans 40 Additional Digital Actors

Eline Van der Velden, creator of AI actress Tilly Norwood, plans to develop 40 more digital actors through studio Xicoia despite industry backlash. What this means for AI filmmakers.

Nov 12, 2025

Time-to-Move: Training-Free Motion Control for AI Video Generation

Researchers introduce Time-to-Move (TTM), a plug-and-play framework that adds precise motion control to video diffusion models without training. Enables object and camera control through dual-clock denoising.

View all Posts