Skip to content

Bagel (BAGEL-7B-MoT)

Mixture-of-Transformer multimodal model supporting understanding, generation, and editing.

Dependencies

The model environment is managed via the bagel image defined in modal/images.py. For local setup, install the dependencies listed in model/Bagel/requirements.txt. The config expects bagel_root to point to model/Bagel.

Flash Attention (required)

Bagel requires Flash Attention (v2.5.8). The Modal image already includes it. For local setup, install a pre-compiled wheel matching your environment — see modal/README.md for the exact environment parameters and installation instructions.

Inference

CLI

The inference configs below are pre-configured for Modal (cloud) paths. For local execution, copy the config and adjust model_path and bagel_root to your local paths.

# Generation
PYTHONPATH=src python -m umm.cli.main infer --config configs/inference/modal_bagel_generation.yaml

# Understanding
PYTHONPATH=src python -m umm.cli.main infer --config configs/inference/modal_bagel_understanding.yaml

# Editing
PYTHONPATH=src python -m umm.cli.main infer --config configs/inference/modal_bagel_editing.yaml

Python API

from umm.inference.pipeline import InferencePipeline
from umm.inference.multimodal_inputs import InferenceRequest

pipeline = InferencePipeline(backbone_name="bagel", backbone_cfg={
    "model_path": "/path/to/BAGEL-7B-MoT",
    "bagel_root": "/path/to/model/Bagel",
    "max_mem_per_gpu": "80GiB",
    "seed": 42,
})

# Generation
result = pipeline.run(InferenceRequest(
    backbone="bagel", task="generation",
    prompt="A cat sitting on a rainbow",
    params={"cfg_text_scale": 7.5, "cfg_img_scale": 1.5, "num_timesteps": 50},
))

# Understanding
result = pipeline.run(InferenceRequest(
    backbone="bagel", task="understanding",
    prompt="Describe this image",
    images=["path/to/image.jpg"],
    params={"max_think_token_n": 500, "do_sample": False},
))

# Editing
result = pipeline.run(InferenceRequest(
    backbone="bagel", task="editing",
    prompt="Make the sky purple",
    images=["path/to/image.jpg"],
    params={"cfg_text_scale": 7.5, "cfg_img_scale": 1.5, "num_timesteps": 50},
))

Supported Benchmarks

Benchmark Config
DPG Bench configs/eval/dpg_bench/dpg_bench_bagel.yaml
GenEval configs/eval/geneval/geneval_bagel.yaml
WISE configs/eval/wise/wise_bagel.yaml
GEdit-Bench configs/eval/gedit/modal_gedit_bagel.yaml
UEval configs/eval/ueval/ueval_bagel.yaml
Uni-MMMU configs/eval/uni_mmmu/uni_mmmu_bagel.yaml
MME configs/eval/mme/mme_bagel.yaml
MMMU configs/eval/mmmu/mmmu_bagel.yaml
MMBench configs/eval/mmbench/mmbench_bagel.yaml
MM-Vet configs/eval/mmvet/mmvet_bagel.yaml
MathVista configs/eval/mathvista/mathvista_bagel.yaml
# Example: run GenEval (two-stage, handled automatically)
PYTHONPATH=src python -m umm.cli.main eval --config configs/eval/geneval/geneval_bagel.yaml

# Example: run MME
PYTHONPATH=src python -m umm.cli.main eval --config configs/eval/mme/mme_bagel.yaml

Key Configuration Parameters

  • Generation / Editing: cfg_text_scale, cfg_img_scale, num_timesteps
  • Understanding: max_think_token_n, do_sample
  • Post-training: SFT, recA, IRG, UniCot (see configs/posttrain/)