EditorNodesPricingBlog

Boogu-Image-0.1 Edit-Turbo: Open Source Image Generation and Editing

July 2, 2026
Boogu-Image-0.1 Edit-Turbo: Open Source Image Generation and Editing

Share this post:

Boogu-Image-0.1 Edit-Turbo: Open Source Image Generation and Editing

Boogu-Image-0.1 is a unified image generation and editing model family released under the Apache 2.0 license. The Edit-Turbo variant, a distilled version built for fast image editing, shipped on June 30, 2026, the newest addition to a family that first launched two weeks earlier.

The model comes from an independent team, the Boogu Team, and is fully open. Weights are on HuggingFace, code and license files are on GitHub, and the project publishes its own benchmark comparisons against both open and closed source systems.

A Unified Model Trained on a Fraction of the Data

Boogu-Image-0.1 uses a 10 billion parameter unified architecture that handles both understanding and generation inside the same model, rather than pairing a separate vision encoder with a diffusion backbone. The project's own documentation frames this as a deliberate response to how closed systems like Nano Banana Pro and GPT-Image-2 achieve their results.

According to the Boogu Team, those closed systems succeed less because of any single breakthrough model and more because of a tightly unified suite of system capabilities. Their stated finding is that systematically improving a model's understanding ability, data quality, and training pipeline can close much of that gap, even under far more limited training compute.

The team states its training data scale is roughly one order of magnitude smaller than some existing open source models. That claim, data efficiency rather than raw scale, is the central engineering story behind the release, and it shows up directly in the benchmark results below.

Beautiful and Precise Photography

Boogu is built to understand photography prompts and produce images with natural lighting, coherent composition, and consistent spatial relationships, even in complex real world scenes. The project's stated goal is realism that feels engaging rather than merely technically correct.

Cinematic still frame of a golden wheat field with a dirt road, by Boogu-Image-0.1

Sample output from Boogu-Image-0.1, via the project's official gallery at boogu.org

This cinematic still frame example shows the model rendering a wide, naturalistic landscape with coherent depth and atmosphere, a scene type the project singles out as a target for its photography focused training.

The road extending toward the horizon and the figure's period appropriate clothing stay spatially and stylistically consistent across the frame, which is the kind of coherence the Boogu Team says breaks down first in weaker photography models.

The model also handles black and white photography, a format that removes color as a crutch and puts more pressure on lighting, contrast, and composition to carry the image.

Portrait oriented and monochrome outputs like this one are part of what the Boogu Team uses internally to evaluate subject fidelity, since flaws in facial structure or lighting consistency are harder to hide without color to distract from them.

Black and white photograph generated by Boogu-Image-0.1 with natural lighting

Sample output from Boogu-Image-0.1, via the project's official gallery at boogu.org

Diverse and Stable Text Rendering

Text rendering is one of the harder problems in image generation, and Boogu is built to handle a wide range of text heavy formats, from posters and stamps to documents and interface mockups, with bilingual Chinese and English support.

Bilingual Tibet travel poster typography generated by Boogu-Image-0.1

Sample output from Boogu-Image-0.1, via the project's official gallery at boogu.org

This travel typography example combines large bilingual destination lettering with scenic imagery and aligned subtitle text, a layout that requires the model to keep large and small text elements legible and correctly positioned at once.

The Boogu Team describes this category as testing readable structure and stable typography across diverse layouts, rather than just the ability to spell words correctly in isolation.

A math exam paper is a dense document rendering test. The image includes equations, multiple choice answer options, and a small geometric diagram, all of which need to remain legible and logically arranged.

This kind of structured document generation, distinct from open ended artistic prompts, is where text heavy layouts tend to break down for models that were not specifically trained on dense typography.

Math exam paper with equations and diagrams generated by Boogu-Image-0.1

Sample output from Boogu-Image-0.1, via the project's official gallery at boogu.org

Viral meme template poster with Chinese text labels generated by Boogu-Image-0.1

Sample output from Boogu-Image-0.1, via the project's official gallery at boogu.org

This meme template poster packs many small Chinese labels, captions, and illustrated scene descriptions into a single grid layout, the kind of dense, high volume text arrangement that stresses a model's typography stability at small sizes.

The project's own documentation is candid that long text, dense typography, and small fonts can still produce typos or layout drift, and that this rendering focus is currently strongest in Chinese and English specifically.

Diverse and Beautiful Stylization

Beyond photography and text, Boogu handles stylized generation across a range of aesthetics, aiming for prompt aware creative output rather than a single fixed style transfer look.

Gilded Guilin landscape painting generated by Boogu-Image-0.1

Sample output from Boogu-Image-0.1, via the project's official gallery at boogu.org

This Guilin landscape sample renders a traditional Chinese painting aesthetic with mineral pigment coloring and glowing gold contours, a stylization category the Boogu Team built around cinematic atmosphere rather than plain style filtering.

Getting the gold linework to stay coherent against the layered mountain silhouettes is the kind of detail that separates a genuine stylistic rendering from a simple color overlay applied to a generic landscape.

The coastal road example leans into high gloss lighting, with bright rim light and reflective road surfaces rendered in warm coral and gold tones. It is a demanding lighting scenario because reflections have to track the scene's geometry accurately.

Stylized outputs like this one demonstrate the same underlying photography and composition training applied to a heavier, more atmospheric visual treatment.

Shining coastal road scene with reflective light generated by Boogu-Image-0.1

Sample output from Boogu-Image-0.1, via the project's official gallery at boogu.org

How Boogu Benchmarks Against Closed and Open Models

On Qwen-Image-Bench, a recently released text-to-image benchmark, Boogu-Image-0.1 scores 53.58 overall, ranking first among the open source models the team evaluated. That score puts it ahead of Qwen-Image-2512, a 20 billion parameter open source model that scores 52.06, and HunyuanImage-3.0, an 80 billion parameter model that scores 50.81.

The comparison matters because Boogu-Image-0.1 has only 10 billion parameters. The project's own framing of this result is that competitive benchmark performance does not require scaling to a much larger parameter count, which lines up with its stated data efficiency claim.

Across the benchmark's full breakdown, quality, aesthetics, alignment, real world fidelity, and creative generation, Boogu-Image-0.1 leads every other Apache 2.0 licensed model in the comparison table, including Qwen Image 2512 and GLM Image. It trails closed source systems like GPT Image 2 and Nano Banana 2.0, which score meaningfully higher across every category.

The Apache 2.0 open source image generation space has grown crowded in 2026, with releases like Krea 2 Raw Turbo and HiDream-O1 targeting similar territory. Boogu's benchmark position among that group is a data point for anyone comparing options rather than a claim of outright superiority, since benchmark scores measure specific evaluation setups rather than every real world use case.

On a separate benchmark called ImgEdit_O, which scores image editing quality, Boogu-Image-0.1-Edit posts the highest score in the entire comparison at 4.64, ahead of every other model listed including closed source systems such as Nano Banana Pro at 4.37 and Seedream 4.5 at 4.32. The Boogu Team itself flags a caveat here, noting in its own documentation that ImgEdit does not always align well with human visual judgment and may underestimate closed source models in interactive use cases, so the result should be read with that limitation in mind rather than as an unqualified win.

Five Variants, One Family

The Boogu-Image-0.1 family spans several purpose built variants rather than a single general model. Base is the core 50 step foundation model, tuned for complex text heavy scenarios with more than 100 characters of dense typography.

Turbo is a distilled version of Base with the same parameter count, typically requiring only 3 to 4 inference steps. The project reports that the raw Turbo model can complete a single inference in under 1 second on high performance hardware. An FP8 quantized variant reduces memory requirements further for constrained deployment.

Edit handles image to image workflows, following bilingual natural language instructions for both local adjustments and larger creative transformations, though the team notes it currently focuses on photography oriented editing rather than scenarios involving large viewpoint or structural changes. Edit-Turbo, the newest variant released June 30, applies the same distillation approach to editing, targeting lower latency edit workflows while preserving instruction following for photography and text heavy edits.

The instruction guided editing approach places Boogu-Image-0.1-Edit alongside other recent open source releases in the same category, including JD.com's JoyAI-Image-Edit-Plus, which combines a large language model with a diffusion transformer for a similar class of natural language edits.

What the Team Says It Still Gets Wrong

Most model releases lead with what a system does well. Boogu-Image-0.1's documentation includes an unusually direct limitations section, and it is worth covering on its own terms rather than treating it as a footnote.

The team states that world knowledge, covering real brands, celebrities, landmarks, and complex cultural context, lags well behind closed source systems, and that the gap is difficult to measure precisely because evaluating a single landmark category alone can require thousands of test samples. Image to image consistency is flagged as another open issue. For editing tasks that require strict preservation of a subject's identity or fine details, the team says Boogu still trails systems like Seedream 5.0 and Nano Banana Pro in some in context scenarios.

Text rendering, despite being one of the model's stronger categories, is described as not fully stable for long text, dense typography, or small fonts, and the optimization work has focused specifically on Chinese and English rather than other languages. The team also acknowledges that body structure can still fail in complex multi person poses, occlusion, or unusual viewpoints, and that small faces and limbs remain a specific weak point tied to the underlying VAE component the model builds on.

Try the Model Directly

A live demo of Boogu-Image is available on HuggingFace Spaces, running on free ZeroGPU infrastructure with no login required. The interface supports both text-to-image generation and image editing, with a toggle between the Turbo and Regular generation modes.

For filmmakers evaluating open source image tools for pre production work, storyboarding, or concept art, AI FILMS Studio's image workspace provides access to a range of image generation models alongside the platform's video and audio tools in one production pipeline.


Sources

GitHub: boogu-project/Boogu-Image
HuggingFace: Boogu/Boogu-Image-0.1-Edit-Turbo
HuggingFace Space: multimodalart/Boogu-Image
Project Page: boogu.org