Supported Benchmarks¶

TorchUMM supports 10+ benchmarks spanning image generation, visual understanding, and image editing.

Benchmark Reference¶

Benchmark	Evaluates	Required Capabilities	Data Source	Data Prep
DPG Bench	Text-to-image detail preservation	Generation	Included in repo	Details
GenEval	Compositional text-to-image generation	Generation	Included in repo	Details
WISE	World knowledge in image generation	Generation	Included in repo	Details
UEval	Unified understanding + generation	Understanding + Generation	HuggingFace	Details
Uni-MMMU	Multimodal understanding, generation, and editing	Understand + Generate + Edit	HuggingFace	Details
MME	Multimodal perception and cognition	Understanding	HuggingFace	Details
MMMU	Massive multimodal understanding	Understanding	HuggingFace (auto)	Details
MMBench	VLM systematic evaluation	Understanding	OpenMMLab	Details
MM-Vet	Integrated vision-language capabilities	Understanding	GitHub	Details
MathVista	Mathematical reasoning with visuals	Understanding	HuggingFace	Details
GEdit-Bench	Image editing quality (VIEScore)	Editing	HuggingFace	Details

Evaluation Types¶

Single-Stage Benchmarks¶

These benchmarks run generation and scoring in a single command:

DPG Bench --- generates images and computes detail-preservation scores
MME --- runs perception and cognition evaluation
MMMU --- runs multimodal understanding evaluation
MMBench --- runs systematic VLM evaluation
MM-Vet --- runs integrated vision-language evaluation
MathVista --- runs mathematical reasoning evaluation

PYTHONPATH=src python -m umm.cli.main eval \
    --config configs/eval/dpg_bench/dpg_bench_bagel.yaml

Two-Stage Benchmarks¶

These benchmarks separate generation from scoring, which allows using different models (or environments) for each stage:

GenEval --- generate images, then score with an object detector
WISE --- generate images, then score with Qwen VL models
UEval --- generate text + image answers, then score with Qwen models
Uni-MMMU --- generate outputs, then score
GEdit-Bench --- edit images, then score with VIEScore (Qwen VL)

# Step 1: Generate
PYTHONPATH=src python -m umm.cli.main eval \
    --config configs/eval/geneval/geneval_bagel_generate.yaml

# Step 2: Score
PYTHONPATH=src python -m umm.cli.main eval \
    --config configs/eval/geneval/geneval_bagel_score.yaml

See Reproducing Results for full two-stage examples.