Helios: Real Time Long Video Generation at 19.5 FPS on a Single GPU

March 5, 2026

Updated: June 5, 2026

Share this post:

Helios: Real Time Long Video Generation at 19.5 FPS on a Single GPU

A research team from Peking University, ByteDance, and Canva has released Helios, a 14 billion parameter autoregressive diffusion model that generates minute scale videos at 19.5 frames per second on a single NVIDIA H100 GPU. The model achieves this without KV-cache, quantization, sparse attention, or any of the standard acceleration techniques the field relies on. It is open source under the Apache 2.0 license and available for commercial use.

Three Core Breakthroughs

Helios is built around three engineering advances that together explain why the model performs so differently from its peers.

Long Video Stability Without Anti-Drift Techniques

Autoregressive video models accumulate errors over time. Each generated frame conditions the next, and small deviations compound into visible drift: repetitive loops, incoherent motion, and gradual scene degradation. The standard industry response has been to add corrective mechanisms like self-forcing, error-banks, and keyframe sampling.

Helios skips all of them. Instead, the team trained the model with targeted strategies that simulate drifting conditions during preparation. The model learns to handle distributional shift from within, rather than relying on external guardrails. This allows it to maintain coherence through up to 1,440 frames, roughly one minute at 24 FPS, without visible degradation.

Helios: minute long video generation without drift

Real Time Speed Without Standard Acceleration

The 19.5 FPS figure is particularly striking because Helios reaches it through architectural compression rather than inference tricks. The model aggressively compresses historical and noisy context representations and reduces sampling steps. The result is that a 14 billion parameter model incurs computational costs similar to a 1.3 billion parameter model at inference time.

No KV-cache. No causal masking shortcuts. No TinyVAE. No quantization. The speed is native to the architecture, not bolted on afterward.

Helios text-to-video short showcase

Training Efficiency at Scale

Most 14B model training runs require tensor parallelism, pipeline parallelism, or sharding frameworks to manage GPU memory. Helios was trained without any of them. The team's infrastructure optimizations allow batch sizes equivalent to image diffusion model training, and fit up to four full 14B instances within 80 GB of GPU memory.

This has practical significance beyond the lab. Researchers and developers who want to fine-tune or experiment with Helios face a much lower infrastructure barrier than comparable models at this parameter scale.

Model Variants and Approximate Inference Times

The release includes three model variants targeting different quality and speed tradeoffs:

Variant	Prediction	Scheduler	Priority
Helios-Base	v-prediction	Standard CFG	Highest quality
Helios-Mid	x0-prediction	CFG-Zero, multi-scale sampling	Balanced
Helios-Distilled	x0-prediction	Custom distilled scheduler	Fastest

Approximate inference times on a single H100 GPU at 24 FPS:

Frames	Duration	Approximate Time
99 frames	~4 seconds	~4 seconds
429 frames	~18 seconds	~18 seconds
1,452 frames	~1 minute	~60 seconds

The distilled variant closes most of the speed gap while retaining competitive quality. The base model represents the ceiling for output fidelity. Frame counts should be multiples of 33 for optimal performance across all variants.

Helios text-to-video long showcase (1,440 frames)

Benchmark Performance

Helios was tested against a range of recent models including Pyramid Flow, MAGI-1, InfinityStar, SkyReels V2, and CausVid. It outperforms existing distilled models across both short video benchmarks (~121 frames) and long video benchmarks (~1,440 frames), while matching or exceeding base models that are considerably slower to run.

The comparison against InfinityStar, itself a 10x speedup over standard diffusion models, is particularly notable. Helios holds its own on quality while generating at real time speeds that InfinityStar does not reach.

Supported Tasks

Helios natively supports three generation modes within a single unified architecture:

Text-to-Video (T2V): Generate video from text prompts.
Image-to-Video (I2V): Animate a still image with a text-guided motion prompt.
Video-to-Video (V2V): Transform or extend existing video footage.

An interactive generation mode is also documented in the repository, suggesting live generation capabilities beyond standard batch pipelines.

Open Source and Deployment

Helios was released on March 4, 2026, under the Apache 2.0 license, which permits commercial use, modification, and redistribution. The full technical report, training code, and all model weights are publicly available.

The repository includes day zero integration with Diffusers, vLLM-Omni, and SGLang-Diffusion. Ascend NPU support is also available, reaching approximately 10 FPS on Huawei hardware. Minimum requirements are Python 3.11.2 and PyTorch 2.7.1 with CUDA 11.8+.

Paper: arXiv 2603.04379
Code: github.com/PKU-YuanGroup/Helios
Project page: pku-yuangroup.github.io/Helios-Page

What This Means for Video Production

The combination of minute scale duration and real time throughput removes two of the most persistent obstacles for AI video in production contexts. Duration has limited narrative utility of AI video, since most tools cap out at 5 to 10 seconds. Speed has limited iterative workflows, since waiting minutes per clip makes shot exploration impractical.

Helios addresses both simultaneously. A filmmaker using the model for previs or storyboard animation can generate a 60 second sequence at full speed, review it, prompt again, and iterate within minutes. For longer sequences like those explored in BlockVid's minute long video research, Helios offers a higher parameter model that does not trade quality for length. The Apache 2.0 license means studios and independent productions can integrate it into commercial pipelines without legal ambiguity. For teams focused on world model navigation rather than linear video generation, MosaicMem builds on Wan 2.2 5B and adds spatial memory layers that sustain consistent camera trajectories across 2 minute rollouts. For precise camera trajectory control on top of Helios, Warp-as-History uses Helios-Distilled as its base and adds full camera control after fine tuning on a single annotated video.

AI FILMS Studio video generation workspace

Try AI FILMS Studio

Generate text-to-video and image-to-video with the latest AI models in the video workspace.

Nodes Graph Editor

Build custom AI workflows by connecting models visually in the Nodes Graph Editor.

For the persistence problem that Helios does not specifically target, a June 2026 framework called LongLive-RAG approaches subject consistency through retrieval: at each generation step it searches its own generation history for relevant reference latents and injects them as correction context, achieving best average rank on VBench-Long across 30-second, 60-second, and 120-second horizons under Apache 2.0.

Sources

arXiv | Peking University | Hugging Face | GitHub: PKU-YuanGroup/Helios | Project Page

Continue Reading

Jul 17, 2026

Andy Serkis Says AI Cannot Replicate an 'Authored Performance' as Hunt for Gollum Begins Filming

Andy Serkis says AI cannot yet replicate an authored performance as The Hunt for Gollum begins filming, and argues that motion capture acting is long overdue for Oscar recognition.

Jul 17, 2026

MolmoMotion: Ai2 Releases Open Source Model That Forecasts 3D Object Motion From Language

Allen Institute for AI releases MolmoMotion, an open source model that predicts 3D object trajectories from video and language instructions, with a dataset of 1.16 million annotated clips.

Jul 17, 2026

Venice Immersive 2026: Margot Robbie, Andy Serkis, Daisy Ridley Lead AI and XR Lineup

Venice Immersive marks its 10th anniversary with 68 projects featuring Margot Robbie, Andy Serkis, Daisy Ridley and Mark Ruffalo in AI and XR immersive works.

View all Posts