Getting Started¶

This guide walks through installation, basic inference, evaluation, and post-training with TorchUMM.

Installation¶

# Clone the repository (including model submodules)
git clone --recursive https://github.com/AIFrontierLab/TorchUMM/
cd umm_codebase

# Install the core package
pip install -e .

Per-model dependencies

Each backbone model has its own Python and PyTorch version requirements. Install only the dependencies for the model(s) you plan to use:

# Example: install Bagel dependencies
pip install -r model/Bagel/requirements.txt

For cloud execution via Modal, each model runs in an isolated container image with the correct environment --- no local dependency conflicts.

CLI Usage¶

Inference¶

# Text-to-image generation with Bagel
PYTHONPATH=src python -m umm.cli.main infer \
    --config configs/inference/modal_bagel_generation.yaml

# Image understanding with Janus-Pro
PYTHONPATH=src python -m umm.cli.main infer \
    --config configs/inference/janus_pro_understanding.yaml

Evaluation¶

# Single-stage benchmark (DPG Bench on Bagel)
PYTHONPATH=src python -m umm.cli.main eval \
    --config configs/eval/dpg_bench/dpg_bench_bagel.yaml

# Two-stage benchmark (GenEval on Bagel --- generation + scoring)
PYTHONPATH=src python -m umm.cli.main eval \
    --config configs/eval/geneval/geneval_bagel.yaml

Post-Training¶

# SFT on Bagel
PYTHONPATH=src python -m umm.cli.main train \
    --config configs/posttrain/bagel_sft.yaml

Python API¶

TorchUMM exposes a programmatic interface through InferencePipeline and InferenceRequest.

from umm.inference.pipeline import InferencePipeline
from umm.inference.multimodal_inputs import InferenceRequest

# Initialize the pipeline with a backbone model
pipeline = InferencePipeline(
    backbone_name="bagel",
    backbone_cfg={
        "model_path": "/path/to/BAGEL-7B-MoT",
        "max_mem_per_gpu": "80GiB",
        "seed": 42,
    },
)

# Text-to-image generation
result = pipeline.run(InferenceRequest(
    backbone="bagel",
    task="generation",
    prompt="A cat sitting on a rainbow",
    params={"num_timesteps": 50},
))

# Image understanding
result = pipeline.run(InferenceRequest(
    backbone="bagel",
    task="understanding",
    prompt="Describe this image in detail.",
    images=["path/to/image.jpg"],
    params={"max_think_token_n": 500, "do_sample": False},
))

# Image editing
result = pipeline.run(InferenceRequest(
    backbone="bagel",
    task="editing",
    prompt="Make the sky purple",
    images=["path/to/image.jpg"],
    params={"num_timesteps": 25},
))

# Batch inference
results = pipeline.run_many(
    [request1, request2, request3],
    batch_size=2,
)

InferenceRequest Fields¶

Field	Type	Description
`backbone`	`str`	Backbone model name (must match the pipeline)
`task`	`str`	`"generation"`, `"understanding"`, or `"editing"`
`prompt`	`str`	Text prompt
`images`	`list[str]`	Input image paths (for understanding/editing)
`videos`	`list[str]`	Input video paths
`params`	`dict`	Task-specific parameters
`output_path`	`str`	Path to save output

Cloud Execution¶

TorchUMM integrates with Modal for cloud GPU execution. This handles environment isolation, model weight caching, and GPU scaling automatically.

# Download model weights to cloud storage (one-time)
modal run modal/download.py --model bagel

# Run evaluation on cloud GPU
modal run modal/run.py --model bagel --eval-config modal_dpg_bench_bagel

See the Cloud (Modal) page for setup and full command reference.

AMD HPC Execution¶

For AMD ROCm clusters, use amd_ prefixed eval configs with the local runner:

# Setup environment (one-time)
bash scripts/amd_migration/setup_all_envs.sh bagel

# Run evaluation
bash scripts/amd_migration/local_run.sh bagel --eval-config amd_ueval_bagel

Config naming: modal_*.yaml (cloud), amd_*.yaml (AMD HPC), *.yaml (legacy local). To regenerate AMD configs: python scripts/generate_amd_configs.py.