EditorNodesPricingBlog

Stable Audio 3: Open Weight Music and SFX for Filmmakers

May 22, 2026
Stable Audio 3: Open Weight Music and SFX for Filmmakers

Image courtesy of Stability AI

Share this post:

Stable Audio 3: Open Weight Music and SFX for Filmmakers

Stability AI released Stable Audio 3 on May 20, 2026, a four model family that covers music composition, sound effects, and audio editing inside a single architecture. Three of the four models ship with open weights on Hugging Face. The fourth runs through API and paid self hosting only.

The release matters for filmmakers because the same family handles two of the most common post production audio jobs, music score generation and on screen sound design, while training only on licensed audio.

The Four Model Lineup

The family scales from on device sound effects up to full song composition. The smaller models target mobile and consumer laptops; the medium and large variants target studio workstations.

Stable Audio 3.0 model comparison chart listing deployment, parameters, track length, inference time, and best use for each variant
Image courtesy of Stability AI

Stability AI's published specifications give an H200 inference time of 0.44 seconds for the small variants, 1.31 seconds for the medium, and 1.80 seconds for the large. Maximum track length is two minutes for the small models and six minutes twenty seconds for the medium and large. Parameter counts are 459M, 459M, 1.4B, and 2.7B respectively.

License Terms

Stable Audio 3 is released under the Stability AI Community License. The license lets creators own their outputs and distribute or commercialize them freely. Organizations exceeding $1 million in annual recurring revenue need an enterprise license, which Stability AI says includes legal indemnification.

Three variants are open weight downloads on Hugging Face: stable-audio-3-small-music, stable-audio-3-small-sfx, and stable-audio-3-medium. The large model is API only.

What Filmmakers Get

The architecture uses a semantic acoustic autoencoder that lets the model edit a section of an existing audio clip in place. Audio inpainting is the most directly relevant capability for film post production, since score revisions and replacement sound effects are usually small edits inside an otherwise finished track.

The small SFX model runs on mobile and consumer laptops, opening the door to on set scoring previews before the team commits to a direction. The medium model holds the full six minute twenty second composition window that maps to typical short film and trailer score lengths.

Trained on Licensed Audio

Stability AI says the entire Stable Audio 3 family is trained on fully licensed audio. TechCrunch references the company's prior licensing partnerships with Warner Music Group and Universal Music Group as the context for that claim. The licensed training position matters for studios that have pushed back on models trained on scraped or unverified audio sources.

Where It Fits in the AI FILMS Studio Audio Stack

Stable Audio 3 sits alongside the music and sound tools already wired into the AI FILMS Studio music workspace and the sound workspace. For voice work, the voice workspace remains the entry point. Filmmakers who want a comparison data point for video and audio together can look at HunyuanVideo Foley, the open source video to audio companion model that ships with its own licensing terms.

Recent open weight peers worth reading alongside Stable Audio 3 include HiDream O1 Image on the image side and SANA WM on the world model side. Together they map the current open weight frontier across image, audio, and video.


Sources

Project Page: Meet Stable Audio 3 — Stability AI HuggingFace: stabilityai/stable-audio-3-small-music · stabilityai/stable-audio-3-small-sfx · stabilityai/stable-audio-3-medium License: Stability AI Community License TechCrunch: Stability AI releases a new audio model that can create 6-minute songs Digital Music News: Stability AI Releases Stable Audio 3.0, Authorized Training Music Business Worldwide: Stability AI launches new audio models that can generate 6-minute music tracks