Helios: Real Time Long Video Generation at 19.5 FPS on a Single GPU
Share this post:
Helios: Real Time Long Video Generation at 19.5 FPS on a Single GPU
A research team from Peking University, ByteDance, and Canva has released Helios, a 14 billion parameter autoregressive diffusion model that generates minute scale videos at 19.5 frames per second on a single NVIDIA H100 GPU. The model achieves this without KV-cache, quantization, sparse attention, or any of the standard acceleration techniques the field relies on. It is open source under the Apache 2.0 license and available for commercial use.
Three Core Breakthroughs
Helios is built around three engineering advances that together explain why the model performs so differently from its peers.
Long Video Stability Without Anti-Drift Techniques
Autoregressive video models accumulate errors over time. Each generated frame conditions the next, and small deviations compound into visible drift: repetitive loops, incoherent motion, and gradual scene degradation. The standard industry response has been to add corrective mechanisms like self-forcing, error-banks, and keyframe sampling.
Helios skips all of them. Instead, the team trained the model with targeted strategies that simulate drifting conditions during preparation. The model learns to handle distributional shift from within, rather than relying on external guardrails. This allows it to maintain coherence through up to 1,440 frames, roughly one minute at 24 FPS, without visible degradation.
Helios: minute long video generation without drift
Real Time Speed Without Standard Acceleration
The 19.5 FPS figure is particularly striking because Helios reaches it through architectural compression rather than inference tricks. The model aggressively compresses historical and noisy context representations and reduces sampling steps. The result is that a 14 billion parameter model incurs computational costs similar to a 1.3 billion parameter model at inference time.
No KV-cache. No causal masking shortcuts. No TinyVAE. No quantization. The speed is native to the architecture, not bolted on afterward.
Helios text-to-video short showcase
Training Efficiency at Scale
Most 14B model training runs require tensor parallelism, pipeline parallelism, or sharding frameworks to manage GPU memory. Helios was trained without any of them. The team's infrastructure optimizations allow batch sizes equivalent to image diffusion model training, and fit up to four full 14B instances within 80 GB of GPU memory.
This has practical significance beyond the lab. Researchers and developers who want to fine-tune or experiment with Helios face a much lower infrastructure barrier than comparable models at this parameter scale.
Model Variants and Approximate Inference Times
The release includes three model variants targeting different quality and speed tradeoffs:
| Variant | Prediction | Scheduler | Priority |
|---|---|---|---|
| Helios-Base | v-prediction | Standard CFG | Highest quality |
| Helios-Mid | x0-prediction | CFG-Zero, multi-scale sampling | Balanced |
| Helios-Distilled | x0-prediction | Custom distilled scheduler | Fastest |
Approximate inference times on a single H100 GPU at 24 FPS:
| Frames | Duration | Approximate Time |
|---|---|---|
| 99 frames | ~4 seconds | ~4 seconds |
| 429 frames | ~18 seconds | ~18 seconds |
| 1,452 frames | ~1 minute | ~60 seconds |
The distilled variant closes most of the speed gap while retaining competitive quality. The base model represents the ceiling for output fidelity. Frame counts should be multiples of 33 for optimal performance across all variants.
Helios text-to-video long showcase (1,440 frames)
Benchmark Performance
Helios was tested against a range of recent models including Pyramid Flow, MAGI-1, InfinityStar, SkyReels V2, and CausVid. It outperforms existing distilled models across both short video benchmarks (~121 frames) and long video benchmarks (~1,440 frames), while matching or exceeding base models that are considerably slower to run.
The comparison against InfinityStar, itself a 10x speedup over standard diffusion models, is particularly notable. Helios holds its own on quality while generating at real time speeds that InfinityStar does not reach.
Supported Tasks
Helios natively supports three generation modes within a single unified architecture:
- Text-to-Video (T2V): Generate video from text prompts.
- Image-to-Video (I2V): Animate a still image with a text-guided motion prompt.
- Video-to-Video (V2V): Transform or extend existing video footage.
An interactive generation mode is also documented in the repository, suggesting live generation capabilities beyond standard batch pipelines.
Open Source and Deployment
Helios was released on March 4, 2026, under the Apache 2.0 license, which permits commercial use, modification, and redistribution. The full technical report, training code, and all model weights are publicly available.
The repository includes day zero integration with Diffusers, vLLM-Omni, and SGLang-Diffusion. Ascend NPU support is also available, reaching approximately 10 FPS on Huawei hardware. Minimum requirements are Python 3.11.2 and PyTorch 2.7.1 with CUDA 11.8+.
- Paper: arXiv 2603.04379
- Code: github.com/PKU-YuanGroup/Helios
- Project page: pku-yuangroup.github.io/Helios-Page
What This Means for Video Production
The combination of minute scale duration and real time throughput removes two of the most persistent obstacles for AI video in production contexts. Duration has limited narrative utility of AI video, since most tools cap out at 5 to 10 seconds. Speed has limited iterative workflows, since waiting minutes per clip makes shot exploration impractical.
Helios addresses both simultaneously. A filmmaker using the model for previs or storyboard animation can generate a 60 second sequence at full speed, review it, prompt again, and iterate within minutes. For longer sequences like those explored in BlockVid's minute long video research, Helios offers a higher parameter model that does not trade quality for length. The Apache 2.0 license means studios and independent productions can integrate it into commercial pipelines without legal ambiguity.
You can explore the full range of AI video generation available today at AI FILMS Studio.
Sources
arXiv | Peking University | Hugging Face | GitHub: PKU-YuanGroup/Helios | Project Page

