EditorNodesPricingBlog

Magenta RealTime 2: Google's Open Source Music AI Runs Live Inside Your DAW

June 7, 2026
Magenta RealTime 2: Google's Open Source Music AI Runs Live Inside Your DAW

Share this post:

Magenta RealTime 2: Google's Open Source Music AI Runs Live Inside Your DAW

Google released Magenta RealTime 2 on June 4, 2026. The model generates 48 kHz stereo audio with a control latency of roughly 200 milliseconds on consumer Apple Silicon hardware. Unlike previous open source music AI tools that require offline batch rendering, MRT2 responds fast enough to use during a live performance or a film scoring session with picture running.

Magenta RealTime 2 generating live music. Courtesy of Google Magenta.

Why 200ms Changes the Workflow

The original Magenta RealTime required a high power GPU or TPU and delivered audio with roughly 3 seconds of control latency. Version 2 cuts that to 200ms, a 15x reduction, and runs on a MacBook Air. The internal frame size dropped from 2 seconds to 40ms.

That latency figure matters for film work specifically. At 200ms, a composer can trigger a style change while watching picture playback and hear the audio response before the next edit. No previous open source music AI achieved this on hardware a working composer actually owns.

Two Model Sizes for Two Use Cases

MRT2 ships in two variants. The Base model has 2.4 billion parameters and requires an M2 Max or M3 Pro at minimum, with a download size of roughly 2.5 GB. The Small model has 230 million parameters and runs on any Apple Silicon Mac, including the base MacBook Air, at roughly 450 MB.

Magenta RealTime 2 model variants: Base with 2.4 billion parameters and Small with 230 million parameters for Apple Silicon
Magenta RealTime 2 model variants. Courtesy of Google Magenta.

Both models generate 48 kHz stereo audio. The Small model trades quality for accessibility. The Base model is the option for production use.

Open Weights Under CC-BY-4.0

Google publishes MRT2 through its Magenta team within Google DeepMind. The code is released under Apache 2.0. Model weights are under Creative Commons Attribution 4.0 International, which permits commercial use with attribution.

Weights are available on HuggingFace. The Python library, called magenta-rt, supports JAX and MLX backends. A C++ inference engine handles the low latency path on Apple Silicon natively, without routing audio through Python at runtime.

Three Ways to Control It

MRT2 accepts text prompts, short audio clips, and MIDI input simultaneously. Describe a style in text, provide a reference clip, and play a MIDI melody. The model blends all three and adjusts its output in under a second as any input changes.

This distinguishes MRT2 from standard music AI generation tools, where you submit a prompt and wait for a render. The model operates as an ongoing generative process rather than a one shot output.

Four Apps Included at Launch

Google ships four applications with the release. Jam is a standalone app with style presets and full MIDI control. Collider blends two style inputs in real time, generating hybrid textures between tonal registers during a session.

The MRT2 Plugin ships as an Audio Unit (AU) component, usable directly inside Logic Pro, GarageBand, and any AU compatible DAW. This is the most direct application for film post production: the model runs inside an existing session so a composer can trigger AI music variations while watching picture, without leaving the DAW or exporting to a separate file.

MRT2 running as an Audio Unit plugin inside a DAW. Courtesy of Google Magenta.

A fourth option supports creative coding extensions for building custom integrations on top of the inference library.

Film Scoring Without Leaving the DAW

Most open source music AI requires a separate generation pass outside the DAW. MRT2's AU plugin removes that step. The model runs locally on Apple Silicon with no cloud dependency. There is no upload latency, no subscription cost, and no footage leaves the machine.

Collider's style blending addresses a recurring scoring challenge: transitions between emotional registers. A cue moving from tension to resolution often needs a brief textural hybrid, something Collider generates from a live prompt change rather than a separate offline render. The technique works across any two style presets loaded into the model.

For dialogue and voice in the same pipeline, MisoTTS released the same week as an 8B open weights voice model with one shot cloning and 110ms latency. For audio analysis and understanding rather than generation, MOSS-Audio covers that side of the workflow. Film composers working in the AI FILMS Studio music workspace can combine these tools across the full audio post pipeline.


Sources

Project Page: magenta.withgoogle.com/magenta-realtime-2
Apps & Plugins: magenta.withgoogle.com/mrt2
GitHub: magenta/magenta-realtime
Hugging Face: google/magenta-realtime-2
arXiv: Live Music Models (2508.04651)
GIGAZINE: Google has released Magenta RealTime 2