EditorNodesPricingBlog

DomainShuttle: Subject Driven Text-to-Video with In Domain and Cross Domain Generation

June 27, 2026
DomainShuttle: Subject Driven Text-to-Video with In Domain and Cross Domain Generation

Share this post:

DomainShuttle: Subject Driven Text-to-Video with In Domain and Cross Domain Generation

Researchers at the Hong Kong University of Science and Technology have released DomainShuttle, a text-to-video model that generates scenes featuring a specific subject from a single reference image. The model offers two distinct generation modes: one that preserves the subject's exact appearance across every frame, and one that intentionally allows the subject's style to shift while keeping its identity intact.

DomainShuttle: subject driven text-to-video generation overview

DomainShuttle is built on the Wan2.2-T2V-A14B backbone, a 14 billion parameter open video model. It is released under the Apache 2.0 license, permitting commercial use. The paper was submitted to arXiv on June 24, 2026, and the model received 62 HuggingFace upvotes within three days of release.

Two Modes, One Reference Image

Most subject driven video generation models make a fixed trade off: either the generated subject looks exactly like the reference (useful for cast consistency, product placement) or the model allows enough variation to produce stylized results (useful for fantasy sequences, animated segments). The two goals are technically in tension because preserving identity means suppressing style variation, and enabling style variation means loosening identity preservation.

DomainShuttle resolves this by separating the two as distinct operating modes. A filmmaker provides a single reference image and a text prompt, then selects which mode applies to the shot. The model handles both from the same architecture without retraining.

In Domain Generation: Exact Subject Fidelity

In domain mode is designed for shots where the subject must look identical to the reference across every frame, regardless of background, lighting, or camera angle. This is the mode for character consistency across a sequence of shots or product placement where brand fidelity is required.

In-domain: multiple objects preserved across frames

In-domain: human subject and object fidelity

The multi object example shows two reference subjects maintained simultaneously across the generated frames, each retaining its specific features as the scene changes around them. The human subject example preserves facial features and clothing texture through motion, demonstrating the kind of consistency required for a believable performance across cuts.

Cross Domain Generation: Style Variation with Subject Consistency

Cross domain mode allows the generated output to exist in a different visual register from the reference image while the subject's core identity persists across both. The reference provides the subject's fundamental structure; the text prompt describes the stylistic world it should inhabit.

Cross-domain: fantasy reference to photorealistic output

Cross-domain: photorealistic reference to fantasy output

The first example converts a fantasy styled reference into a photorealistic generated scene. The second reverses the direction, turning a photorealistic reference into a fantasy rendered output. Both examples retain the subject's identity across the domain shift, which is the capability that prior open source character driven models have not provided in a single unified framework. Earlier work on character consistency, including approaches like BindWeave, focused on maintaining fidelity within a single visual register. Cross domain transfer in both directions is DomainShuttle's distinguishing capability.

The Architecture

DomainShuttle introduces three specific technical components to achieve dual mode generation on the Wan2.2 backbone.

The first is Domain MoT (Domain Mixture of Tasks). This component decouples domain aware features from subject identity features during encoding. Instead of treating the reference image as a single unified signal, Domain MoT separates what the subject is from what visual world it belongs to, allowing the generation head to apply either or both independently.

The second is Video Reference DualRoPE, a positional encoding scheme that separates the reference image tokens from the video generation tokens in the attention mechanism. Standard positional encodings mix reference and generation frames spatially, which can cause the model to blur subject features into background context. DualRoPE keeps them distinct throughout the generation pass.

The third is Cross Pair Consistent Loss, a training objective that extracts subject features from pairs of images with different lighting, backgrounds, and poses. By training on variation, the model learns to identify which features are intrinsic to the subject and which are incidental to the capture conditions.

The Wan model family underpins the generation backbone. At 14 billion parameters, Wan2.2-T2V-A14B is among the strongest open video model foundations available as of mid-2026, and DomainShuttle extends it for subject-driven use cases without modifying the core weights.

Available Under Apache 2.0

The codebase is available on GitHub from HKUST-C4G and the model weights on HuggingFace under Apache 2.0, permitting commercial use with no attribution requirement beyond license inclusion. The paper gives 480p and 720p as supported inference resolutions.

The institutional backing matters for production teams assessing reliability. HKUST ranks consistently among the top computer science research institutions in Asia, and the 10-author team represents a sustained research effort rather than a solo release. The 62 HuggingFace upvotes in three days signals rapid pickup from the practitioner community.

Filmmakers building character-consistent sequences can experiment with text-to-video and image-to-video generation directly in AI FILMS Studio's video workspace.


Sources

arXiv: DomainShuttle: Revisiting Domain in Subject-Driven Video Generation GitHub: HKUST-C4G/DomainShuttle HuggingFace: CNcreator0331/DomainShuttle_weight Project Page: cn-makers.github.io/DomainShuttle