Post-Training Methods¶

TorchUMM supports multiple post-training strategies for fine-tuning multimodal models (currently targeting Bagel). All methods are configured via YAML files and launched through the CLI.

Supported Methods¶

Method	Description	Config
SFT	Supervised fine-tuning on task-specific data	`configs/posttrain/bagel_sft.yaml`
IRG	Interleaved Reasoning Generation — 2-stage curriculum training	`configs/posttrain/irg_stage1.yaml` / `irg_stage2.yaml`
recA	Reconstruction Alignment — trains generation with alignment signal	`configs/posttrain/recA.yaml`
UniCot	Unified Chain-of-Thought training using LoRA	`configs/posttrain/unicot.yaml`

LoRA

LoRA is used internally by UniCot (lora_rank=256, lora_alpha=512) and is not a standalone training method.

Usage¶

CLI¶

# SFT on Bagel
PYTHONPATH=src python -m umm.cli.main train \
    --config configs/posttrain/bagel_sft.yaml

# IRG Stage 1
PYTHONPATH=src python -m umm.cli.main train \
    --config configs/posttrain/irg_stage1.yaml

# IRG Stage 2
PYTHONPATH=src python -m umm.cli.main train \
    --config configs/posttrain/irg_stage2.yaml

# recA
PYTHONPATH=src python -m umm.cli.main train \
    --config configs/posttrain/recA.yaml

# UniCot (LoRA-based)
PYTHONPATH=src python -m umm.cli.main train \
    --config configs/posttrain/unicot.yaml

Config Structure¶

Post-training configs follow this structure:

train:
  pipeline: bagel          # selects the training pipeline (bagel | recA | unicot | irg)
  cwd: src/umm/post_training/sft/bagel
  torchrun:
    nnodes: 1
    nproc_per_node: 4      # number of GPUs
  script: train/pretrain_unified_navit.py
  env:
    PYTHONPATH: .
  args:
    model_path: /model_cache/bagel/BAGEL-7B-MoT
    lr: 2e-5
    save_every: 1000
    results_dir: /checkpoints/bagel_sft

Key fields:

Field	Description
`pipeline`	Selects the training dispatcher (`bagel`, `recA`, `unicot`, `irg`)
`cwd`	Working directory for the training script
`torchrun`	Multi-GPU launch params (`nnodes`, `nproc_per_node`)
`script`	Training script path (relative to `cwd`)
`env`	Environment variables (e.g., `PYTHONPATH`)
`args`	Training hyperparameters forwarded as CLI flags

Pipeline Dispatch Logic¶

The CLI routes to the correct training pipeline based on the pipeline field:

`pipeline` value	Handler
`bagel`	`umm.post_training.sft.bagel.pipeline.run_bagel_train`
`recA`	`umm.post_training.recA.pipeline.run_reca_train`
`unicot`	`umm.post_training.unicot.pipeline.run_unicot_train`
`irg`	`umm.post_training.IRG.pipeline.run_irg_train`

Each pipeline translates the config dict into a torchrun or python subprocess and executes the training script inside the corresponding model repo directory.

Cloud Post-Training¶

For cloud GPU execution via Modal:

modal run modal/train.py --config bagel_sft

See the Cloud (Modal) page for setup and additional details.