Skip to content

Janus-Pro

Scaled-up multimodal model from DeepSeek (7B) with improved training and stronger multimodal capabilities.

Supported Variants

Variant HuggingFace Parameters
Janus-Pro-7B deepseek-ai/Janus-Pro-7B 7B

Janus-Pro shares the same backbone adapter (janus_pro) and architecture as the original Janus (1.3B), but at larger scale with better performance. See also JanusFlow for the rectified flow variant.

Dependencies

The model environment is managed via the janus_pro image defined in modal/images.py. For local setup, install the dependencies listed in model/Janus/requirements.txt.

Flash Attention (required)

Janus-Pro requires Flash Attention (v2.7.4). The Modal image already includes it. For local setup, install a pre-compiled wheel matching your environment — see modal/README.md for the exact environment parameters and installation instructions.

Architecture Note

Janus-Pro uses parallel generation: a single forward pass produces multiple images simultaneously (parallel_size=4). Generation uses classifier-free guidance (cfg_weight=5.0) with 576 image tokens per image.

Inference

CLI

# Generation
PYTHONPATH=src python -m umm.cli.main infer --config configs/inference/janus_pro_generation.yaml

# Understanding
PYTHONPATH=src python -m umm.cli.main infer --config configs/inference/janus_pro_understanding.yaml

Python API

from umm.inference.pipeline import InferencePipeline
from umm.inference.multimodal_inputs import InferenceRequest

pipeline = InferencePipeline(backbone_name="janus_pro", backbone_cfg={
    "model_path": "/path/to/janus_pro_weights",
    "janus_root": "/path/to/model/Janus",
    "seed": 42,
    "torch_dtype": "bfloat16",
})

# Generation
result = pipeline.run(InferenceRequest(
    backbone="janus_pro", task="generation",
    prompt="A cat sitting on a rainbow",
))

# Understanding
result = pipeline.run(InferenceRequest(
    backbone="janus_pro", task="understanding",
    prompt="Describe this image",
    images=["path/to/image.jpg"],
))

Supported Benchmarks

Benchmark Config
DPG Bench configs/eval/dpg_bench/dpg_bench_janus_pro.yaml
GenEval configs/eval/geneval/geneval_janus_pro.yaml
WISE configs/eval/wise/wise_janus_pro.yaml
UEval configs/eval/ueval/ueval_janus_pro.yaml
Uni-MMMU configs/eval/uni_mmmu/uni_mmmu_janus_pro.yaml
MME configs/eval/mme/mme_janus_pro.yaml
MMMU configs/eval/mmmu/mmmu_janus_pro.yaml
MMBench configs/eval/mmbench/mmbench_janus_pro.yaml
MM-Vet configs/eval/mmvet/mmvet_janus_pro.yaml
MathVista configs/eval/mathvista/mathvista_janus_pro.yaml
# Example: run GenEval
PYTHONPATH=src python -m umm.cli.main eval --config configs/eval/geneval/geneval_janus_pro.yaml

# Example: run UEval
PYTHONPATH=src python -m umm.cli.main eval --config configs/eval/ueval/ueval_janus_pro.yaml

Key Configuration Parameters

  • Generation: seed, torch_dtype (cfg_weight=5.0 and parallel_size=4 are model defaults)
  • Understanding: uses VLChatProcessor for image preprocessing; returns text and sft_format