Show-o2¶

Unified multimodal model from Show Lab supporting understanding and generation.

Original repository: https://github.com/showlab/Show-o
Backbone key: show_o2
Capabilities: Understanding, Generation (NO Editing)

Dependencies¶

The model environment is managed via the show_o2 image defined in modal/images.py. For local setup, install the dependencies listed in model/Show-o/requirements.txt.

Flash Attention (required)¶

Show-o2 requires Flash Attention (v2.7.4). The Modal image already includes it. For local setup, install a pre-compiled wheel matching your environment — see modal/README.md for the exact environment parameters and installation instructions.

Architecture Note¶

Show-o2 uses subprocess-based inference (wrapping the original Show-o scripts). The backbone key is show_o2, but inference config filenames use the prefix show_o_ (matching the repo directory name).

Version is auto-detected from the model's config.json — models containing "Showo2" in their config are treated as Show-o2.

Inference¶

CLI¶

# Generation
PYTHONPATH=src python -m umm.cli.main infer --config configs/inference/show_o2_generation.yaml

# Understanding
PYTHONPATH=src python -m umm.cli.main infer --config configs/inference/show_o2_understanding.yaml

Python API¶

from umm.inference.pipeline import InferencePipeline
from umm.inference.multimodal_inputs import InferenceRequest

pipeline = InferencePipeline(backbone_name="show_o2", backbone_cfg={
    "model_path": "/path/to/show_o2_weights",
    "show_o_root": "/path/to/model/Show-o",
    "vae_path": "/path/to/Wan2.1_VAE.pth",
    "seed": 42,
})

# Generation
result = pipeline.run(InferenceRequest(
    backbone="show_o2", task="generation",
    prompt="A cat sitting on a rainbow",
))

# Understanding (requires image input)
result = pipeline.run(InferenceRequest(
    backbone="show_o2", task="understanding",
    prompt="Describe this image",
    images=["path/to/image.jpg"],
))

Supported Benchmarks¶

Benchmark	Config
DPG Bench	`configs/eval/dpg_bench/dpg_bench_show_o2.yaml`
GenEval	`configs/eval/geneval/geneval_show_o2.yaml`
WISE	`configs/eval/wise/wise_show_o2.yaml`
UEval	`configs/eval/ueval/ueval_show_o2.yaml`
Uni-MMMU	`configs/eval/uni_mmmu/uni_mmmu_show_o2.yaml`
MME	`configs/eval/mme/mme_show_o2.yaml`
MMMU	`configs/eval/mmmu/mmmu_show_o2.yaml`
MMBench	`configs/eval/mmbench/mmbench_show_o2.yaml`
MM-Vet	`configs/eval/mmvet/mmvet_show_o2.yaml`
MathVista	`configs/eval/mathvista/mathvista_show_o2.yaml`

# Example: run GenEval
PYTHONPATH=src python -m umm.cli.main eval --config configs/eval/geneval/geneval_show_o2.yaml

# Example: run MME
PYTHONPATH=src python -m umm.cli.main eval --config configs/eval/mme/mme_show_o2.yaml

Key Configuration Parameters¶

Generation: seed, torch_dtype, vae_path
Understanding: subprocess-based, configured via show_o_root