LTX-2.3: Lightricks Upgrades Its Open Source Audio Video Model

Share this post:
LTX-2.3: Lightricks Upgrades Its Open Source Audio Video Model
Lightricks has released LTX-2.3, a significant update to its open source audio video foundation model. The new version brings improved visual quality, better prompt adherence, and a suite of upscaler models that push generated content toward higher resolutions and smoother frame rates.
What Is LTX-2.3
LTX-2.3 is a DiT based (Diffusion Transformer) audio video foundation model that generates synchronized video and audio within a single pass. Built on the same architecture as LTX-2, the 2.3 release focuses on refinement rather than architectural overhaul. The core model has 22 billion parameters and ships in two variants: a full dev model and a distilled version.
The model supports a broad range of generation tasks in a single unified system:
- text-to-video
- image-to-video
- video-to-video
- audio-to-video and video-to-audio
- image and text to audio video
This multimodal flexibility makes LTX-2.3 one of the most capable open source video models available today.
What Changed in 2.3
The headline improvements in this release are audio and visual quality. Lightricks reports stronger prompt adherence across both modalities, meaning the model follows text descriptions more precisely when generating the visual scene and its accompanying sound.
The distilled variant now runs in just 8 steps with a classifier-free guidance value of 1, making inference substantially faster without a major quality penalty. For creators who need rapid iteration, this is the practical path to quick results.
Lightricks also introduced a set of upscaler models released alongside the main checkpoint:
| Upscaler | Function |
|---|---|
| ltx-2.3-spatial-upscaler-x2-1.0 | 2x spatial resolution increase |
| ltx-2.3-spatial-upscaler-x1.5-1.0 | 1.5x spatial resolution increase |
| ltx-2.3-temporal-upscaler-x2-1.0 | 2x frame rate increase |
The spatial upscalers allow creators to generate at a manageable resolution and scale up afterward, while the temporal upscaler doubles the frame rate of existing clips. Used in combination, these tools make high resolution, high frame rate output more accessible on consumer hardware.
Model Variants
LTX-2.3 ships with four distinct checkpoints:
ltx-2.3-22b-dev: The full model in bf16 precision. This is the flexible, trainable base intended for fine-tuning, LoRA training, and research workflows.
ltx-2.3-22b-distilled: The 8-step distilled version for faster inference. Lower memory overhead and significantly quicker generation times compared to the dev model.
ltx-2.3-22b-distilled-lora-384: A LoRA adapter that applies distillation behavior to the dev model. Useful if you want the full model's quality ceiling with faster sampling.
Upscaler models: The three upscaler checkpoints described above, applied as a post-processing step after generation.
Training and Fine-Tuning
The dev model is fully trainable. Lightricks provides reproducible LoRA and IC-LoRA training through the LTX-2 Trainer, with the company noting that motion, style, and likeness training can complete in under an hour in many configurations. This puts custom model training within reach for individual creators and small studios, not just large teams with dedicated compute.
Technical Requirements
LTX-2.3 requires Python 3.12 or newer, CUDA above version 12.7, and PyTorch 2.7. Resolution inputs must be divisible by 32, and frame counts must follow the formula: divisible by 8, plus 1. The model can be run through the official PyTorch codebase or through ComfyUI using the built-in LTXVideo nodes.
Diffusers support is listed as coming soon, which will broaden compatibility with the wider Python AI tooling ecosystem.
Running LTX-2.3 Locally
For ComfyUI users, the LTXVideo nodes are available through ComfyUI Manager and the official documentation at docs.ltx.video. For direct Python usage:
git clone https://github.com/Lightricks/LTX-2.git
cd LTX-2
uv sync
source .venv/bin/activate
From there, the inference scripts handle both the dev and distilled checkpoints, with the upscalers applied as a second stage.
A live demo is available at the LTX-2.3 API Playground for testing generation without a local setup.
Generate Video Without a Local Setup
If you want to generate AI video without installing anything locally, AI FILMS Studio lets you create videos in the browser with no environment setup required. The platform currently runs LTX-2, the previous version from Lightricks, which already delivers strong results for text-to-video and image-to-video workflows.
For context on what LTX-2 offers and how it compares to the 2.3 update, see our earlier coverage of LTX-2. If you want to run the model locally on consumer hardware, our LTX-2 4K RTX GPU setup guide walks through ComfyUI installation and configuration.
