Magenta RealTime 2: Google's Open Source Music AI Runs Live Inside Your DAW

June 7, 2026

Updated: July 22, 2026

Share this post:

Magenta RealTime 2: Google's Open Source Music AI Runs Live Inside Your DAW

Google released Magenta RealTime 2 on June 4, 2026. The model generates 48 kHz stereo audio with a control latency of roughly 200 milliseconds on consumer Apple Silicon hardware. Unlike previous open source music AI tools that require offline batch rendering, MRT2 responds fast enough to use during a live performance or a film scoring session with picture running.

Magenta RealTime 2 generating live music. Courtesy of Google Magenta.

Why 200ms Changes the Workflow

The original Magenta RealTime required a high power GPU or TPU and delivered audio with roughly 3 seconds of control latency. Version 2 cuts that to 200ms, a 15x reduction, and runs on a MacBook Air. The internal frame size dropped from 2 seconds to 40ms.

That latency figure matters for film work specifically. At 200ms, a composer can trigger a style change while watching picture playback and hear the audio response before the next edit. No previous open source music AI achieved this on hardware a working composer actually owns.

The 200ms threshold is meaningful for a practical reason: it falls below the range where the delay becomes perceptible as a disconnection between the control gesture and the sound. At 3 seconds, the composer hears the result of an earlier decision. At 200ms, the response reads as immediate. That perceptual shift is what changes MRT2 from a generative tool into an instrument.

The Architecture Behind 200ms Latency

The version 1 latency came primarily from the Python runtime overhead and the 2 second internal frame size. Every generation cycle had to process 2 seconds of audio context, and the Python audio I/O layer added serialization latency on top of that.

MRT2 addresses both problems. The internal frame size dropped to 40ms, which means each generation step processes 50x less audio context per cycle. A separate C++ inference engine handles the low latency audio path on Apple Silicon natively, bypassing the Python runtime entirely during generation. Audio I/O runs through the C++ layer and does not touch Python after initialization.

The JAX and MLX backend options allow the model to use native Metal GPU acceleration on Apple Silicon. MLX is Apple's machine learning framework optimized for its unified memory architecture, where CPU and GPU share the same memory pool. That shared memory eliminates the data transfer overhead that GPU acceleration incurs on discrete GPU hardware, which is one reason the MacBook Air runs MRT2 at all despite having no discrete graphics card.

The separation of the generation loop from Python is also what enables the Audio Unit plugin. AU plugins must run inside the host DAW's audio thread with strict real time constraints. A Python process cannot satisfy those constraints, but the C++ engine can.

The latency improvement also has implications for ensemble performance. At 3 seconds of control latency, only a solo performer can play alongside the AI, because the response arrives too late to be a musical partner for group work. At 200ms, two or three musicians can play together with MRT2 generating accompaniment, with each control input reflecting near instantaneously in the generated output.

Two Model Sizes for Two Use Cases

MRT2 ships in two variants. The Base model has 2.4 billion parameters and requires an M2 Max or M3 Pro at minimum, with a download size of roughly 2.5 GB. The Small model has 230 million parameters and runs on any Apple Silicon Mac, including the base MacBook Air, at roughly 450 MB.

Magenta RealTime 2 model variants: Base with 2.4 billion parameters and Small with 230 million parameters for Apple Silicon

Magenta RealTime 2 model variants. Courtesy of Google Magenta.

Both models generate 48 kHz stereo audio. The Small model trades quality for accessibility. The Base model is the option for production use.

The 450 MB download for the Small model makes it feasible to try before committing storage. The 2.5 GB Base model download is comparable in size to a single uncompressed audio project and is not a barrier for any production machine. The meaningful constraint is the M2 Max or M3 Pro requirement, which excludes base MacBook Pro configurations and all MacBook Air models from the Base model at the hardware level.

Open Weights Under CC-BY-4.0

Google publishes MRT2 through its Magenta team within Google DeepMind. The code is released under Apache 2.0. Model weights are under Creative Commons Attribution 4.0 International, which permits commercial use with attribution.

Weights are available on HuggingFace. The Python library, called magenta-rt, supports JAX and MLX backends. A C++ inference engine handles the low latency path on Apple Silicon natively, without routing audio through Python at runtime.

What CC-BY-4.0 Means in Practice

CC-BY-4.0 is the most permissive Creative Commons license. It permits commercial use, adaptation, and redistribution without restriction. The only requirement is attribution: the Google Magenta team must be credited in the project, release, or production that uses the weights.

For a film scoring context, attribution typically means a credit in the film's music or technology acknowledgments. There is no requirement to open source the score, share derivative models, or report revenue. A composer using MRT2 to generate cues for a theatrical release meets the license requirement by including a credit.

The CC-BY-4.0 license on the weights paired with Apache 2.0 on the code is the most permissive combination available for production music AI. Apache 2.0 permits commercial use and modification of the inference code. CC-BY-4.0 permits commercial use of the model outputs. There are no dual licensing fees, no enterprise tiers, and no revenue thresholds.

Three Ways to Control It

MRT2 accepts text prompts, short audio clips, and MIDI input simultaneously. Describe a style in text, provide a reference clip, and play a MIDI melody. The model blends all three and adjusts its output in under a second as any input changes.

This distinguishes MRT2 from standard music AI generation tools, where you submit a prompt and wait for a render. The model operates as an ongoing generative process rather than a one shot output.

MIDI Control and the Film Scoring Workflow

MIDI input is the control path that most directly integrates MRT2 into a professional scoring session. A composer playing a melody on a MIDI keyboard gets AI generated harmonic and textural accompaniment in under a second, without stopping to type a prompt or select a style preset. The melodic content from the keyboard guides the model's generation.

This workflow has a practical advantage over text prompt control in picture lock scenarios. A composer synchronizing music to edited picture is making moment to moment decisions about phrasing and rhythm that are faster to express by playing than by describing. MIDI input lets that expressive decision reach the model in the same timescale as the compositional choice. A style description can be set once for a cue and adjusted between takes.

The combination of MIDI with simultaneous text and audio reference inputs is also where MRT2 departs most from earlier music AI. Prior tools required a choice: either describe the style in text or provide an audio reference. MRT2 takes all three simultaneously, with the model weighting each input dynamically as it changes. A composer can anchor the harmonic language to a reference clip, define the emotional register in text, and shape the rhythmic content through a MIDI performance.

Four Apps Included at Launch

Google ships four applications with the release. Jam is a standalone app with style presets and full MIDI control. Collider blends two style inputs in real time, generating hybrid textures between tonal registers during a session.

The MRT2 Plugin ships as an Audio Unit (AU) component, usable directly inside Logic Pro, GarageBand, and any AU compatible DAW. This is the most direct application for film post production: the model runs inside an existing session so a composer can trigger AI music variations while watching picture, without leaving the DAW or exporting to a separate file.

MRT2 running as an Audio Unit plugin inside a DAW. Courtesy of Google Magenta.

A fourth option supports creative coding extensions for building custom integrations on top of the inference library.

Film Scoring Without Leaving the DAW

Most open source music AI requires a separate generation pass outside the DAW. MRT2's AU plugin removes that step. The model runs locally on Apple Silicon with no cloud dependency. There is no upload latency, no subscription cost, and no footage leaves the machine.

Collider's style blending addresses a recurring scoring challenge. Transitions between emotional registers in a cue often need a brief textural hybrid, something Collider generates from a live prompt change rather than a separate offline render. The technique works across any two style presets loaded into the model.

The local execution model also matters for projects that restrict external uploads. Composers receiving picture dailies under NDA can use MRT2 inside Logic Pro on their local machine without routing footage or audio through a cloud service. The generation performance is determined entirely by the local hardware, with no network dependency at any point in the session.

For dialogue and voice in the same pipeline, MisoTTS released the same week as an 8B open weights voice model with one shot cloning and 110ms latency. For audio analysis and understanding rather than generation, MOSS-Audio covers that side of the workflow. For projects that need speech, sound effects, and music generated together and locked to picture, Foley-Omni handles all three simultaneously in a single pass under the MIT license. Film composers working in the AI FILMS Studio music workspace can combine these tools across the full audio post pipeline.

How MRT2 Fits the Open Source Music AI Ecosystem

MRT2 fills a specific gap in the open source music AI ecosystem. It achieves live music generation at latency below 300ms on consumer hardware. Other open source music generation models, including those designed for film scoring, require an offline render step and are not suitable for use during picture playback. MRT2 is the first open weights model to remove that constraint on hardware a working composer is likely to own.

The tool complements rather than replaces the batch generation tools already in use. A composer can use MRT2 to explore a cue's emotional direction in real time during a scoring session, then use a higher quality offline model to render the final output at the resolution and length the picture requires. The two categories of tool address different stages of the same workflow.

The open licensing stack also makes the combined workflow practical for production use. With MRT2's weights under CC-BY-4.0 and the inference code under Apache 2.0, a composer building a scoring pipeline is not restricted to a single vendor's model. The tools can be combined and replaced without license renegotiation as better models become available.

For composers who need a full song generation tool rather than a real time generative layer, ACE-Step 1.5 runs locally on consumer GPUs, scores 8.09 on SongEval above Suno v5, and ships under an MIT license.

AI FILMS Studio video generation workspace

Try AI FILMS Studio

Generate text-to-video and image-to-video with the latest AI models in the video workspace.

Nodes Graph Editor

Build custom AI workflows by connecting models visually in the Nodes Graph Editor.

Sources

Project Page: magenta.withgoogle.com/magenta-realtime-2
Apps & Plugins: magenta.withgoogle.com/mrt2
GitHub: magenta/magenta-realtime
Hugging Face: google/magenta-realtime-2
arXiv: Live Music Models (2508.04651)
GIGAZINE: Google has released Magenta RealTime 2

Continue Reading

Jul 22, 2026

ABot-World-0: Alibaba's 5B Interactive World Model Runs at 720P in Real Time

Alibaba's AMAP CV Lab releases ABot-World-0, a 5B action conditioned world model generating 720P environments at 16 FPS with latency of 1.2 seconds on a single RTX 5090. Apache 2.0.

Jul 22, 2026

Darren Aronofsky Raises $11M of $15M Target for AI Studio Primordial Soup

Darren Aronofsky's Primordial Soup Labs files SEC disclosure targeting $15M, with $11M raised in three weeks, and begins hiring for feature films and TV series.