SANA Video | small, fast text to video with linear attention

September 29, 2025

Share this post:

SANA Video | what matters

SANA Video is a small diffusion model for fast, long form video generation. The project page describes a target of up to 720 by 1280 resolution, around one minute maximum duration, and results that track prompts closely while keeping latency low enough to feel practical. The design centers on a Linear Diffusion Transformer, which replaces standard quadratic attention with a linear variant to cut compute on large token counts. On top of that, the authors introduce a constant memory state derived from the cumulative properties of linear attention. Instead of caching a growing set of keys and values as a sequence gets longer, the model maintains a fixed state that preserves global context at steady memory cost. That is the trick that lets it extend to longer clips without running out of VRAM or exploding inference time. They also report a training recipe that emphasizes data filtering and staged resolution increases to contain cost while preserving motion quality and aesthetics.

For filmmaking the appeal is straightforward. Many text to video systems can produce beautiful five second shots, but long sequences usually bog down on memory or drift off prompt. SANA Video aims to hold both length and coherence while running on accessible hardware. The page highlights deployment on an RTX 5090 using NVFP4 precision with a reported speedup over standard settings, and the paper compares favorably against other small models in latency. There are also image to video examples and references to multi scene transitions for longer narratives. If those claims hold when the code lands, SANA Video could serve as a fast previz engine for story beats, blocking, and tone tests where you need more than a single burst of motion. Until then, treat the page and preprint as a technical roadmap and a set of reported benchmarks rather than a drop in tool.

Availability and licensing

As of now the public page lists Code (coming soon) and states that code and model will be publicly released. No repository, license file, or model weights are posted yet. That means there is no open source release at this moment, and no published license to allow commercial use. You can read the paper and review the demos, but you cannot adopt the system until NVIDIA publishes code and weights together with explicit terms. When they do, verify the license on the official repository and model card before integrating SANA Video into a production pipeline.

Sources

Project page: https://nvlabs.github.io/Sana/Video/
Paper (PDF): https://arxiv.org/pdf/2509.24695

Continue Reading

Nov 16, 2025

Actors Joel Edgerton and Felicity Jones Address AI in Film

Joel Edgerton and Felicity Jones share pragmatic views on AI technology during Train Dreams press tour, reflecting on craft preservation and adaptation.

Nov 13, 2025

Disney+ Plans AI User-Generated Content Within Disney IP Boundaries

Disney CEO Bob Iger announces plans for AI-powered user-generated content on Disney+ limited to Disney intellectual property. What this means for creators seeking unrestricted storytelling tools.

Nov 12, 2025

Creator of AI Actress Tilly Norwood Plans 40 Additional Digital Actors

Eline Van der Velden, creator of AI actress Tilly Norwood, plans to develop 40 more digital actors through studio Xicoia despite industry backlash. What this means for AI filmmakers.

View all Posts