VideoFrom3D: 3D scene video generation from coarse geometry

Share this post:
VideoFrom3D | what matters
What it is
VideoFrom3D generates watchable scene videos from the ingredients filmmakers already have on hand: a rough 3D blockout, a camera path that matches your boards, and a single reference image to set look and mood. Instead of asking you to invent motion out of thin air, the system leans on a simple idea. Use an image model to produce high quality anchor views that respect composition and style, then use a video model to fill the in-between frames so the camera move feels continuous. That split keeps detail where it counts and avoids the mushy motion that plagues many one-shot generators. In practice you get a clip that follows your lensing and timing, carries the palette and texture you asked for, and arrives fast enough to keep a meeting moving. It is a tool for scene videos and exploratory cinematography, not a character animation replacement, which makes it a clean fit for previs and pitch work.
Why filmmakers should care
Previs lives on speed and clarity. With VideoFrom3D you can turn a greybox level into a styled moving scene that reflects your camera notes and grading intent, then make decisions with your collaborators while everyone looks at the same picture. That helps directors test blocking and eye trace, helps editors feel pacing before a single day on set, and gives production design a shared target for materials and scale. Because the method respects a long prompt with concrete details, you can describe lens choice, time of day, atmosphere, and typography if you want diegetic text or signage, and those elements persist across the shot. For pitches and lookbooks, this replaces the usual jump from stills to imagination with a short, coherent pass you can actually cut against. The goal is not perfection. It is to remove the dead time between an idea and a clip that communicates that idea to a team.
How it works in plain language
The pipeline has two cooperating pieces. First comes Sparse Anchor-view Generation, which uses an image diffusion model to render a handful of key frames that align across viewpoints and carry your style reference with high fidelity. Those anchors establish the look. Then Geometry-Guided Generative Inbetweening takes over. A video diffusion model, guided by your camera path and by the structure of the coarse geometry, synthesizes the intermediate frames that connect those anchors. Because the camera trajectory is your own, the result reads like a deliberate move rather than a drifting hallucination. This division of labor is why the output feels both sharp and continuous. You are asking each model to do the thing it does best and letting geometry keep them honest. The result is a moving shot that obeys your layout and timing while borrowing richness from a single image.
What it is not
This is not motion capture, pose transfer, or a tool for dialog performance. It is about environments, cameras, and style. If you need character acting, you will still stage that work with the right tools or performers and composite as needed. Think of VideoFrom3D as a fast way to scout virtually, prove a move, or set the tone for a space. In that role it shines, because it removes friction without blurring authorship. You provide the layout and the path. The system gives you a version worth showing, and you can iterate without throwing away your scene structure.
License and practical notes
As of now the public repository does not include a license file, which means no open source grant by default. Treat the code and any linked checkpoints as research artifacts and get explicit permission from the authors before using them in commercial work. For best results bring your own coarse geometry from your DCC or engine, author a camera path that matches the storyboard beats, and use a single strong reference image to define palette and texture. Save prompts, seeds, and commit hashes to make runs reproducible, and expect research grade edges while the implementation evolves.
Sources
- Project page: https://kimgeonung.github.io/VideoFrom3D/
- Paper: https://arxiv.org/abs/2509.17985
- GitHub: https://github.com/KIMGEONUNG/VideoFrom3D
- Checkpoint folder: https://drive.google.com/drive/folders/1IhI9qDv6tH5T7XzeEjx27UYw2EqZ7MKY