Janus-Pro¶
Scaled-up multimodal model from DeepSeek (7B) with improved training and stronger multimodal capabilities.
- Original repository: https://github.com/deepseek-ai/Janus
- Paper: Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling
- Backbone key:
janus_pro - Capabilities: Understanding, Generation (NO Editing)
Supported Variants¶
| Variant | HuggingFace | Parameters |
|---|---|---|
| Janus-Pro-7B | deepseek-ai/Janus-Pro-7B |
7B |
Janus-Pro shares the same backbone adapter (janus_pro) and architecture as the original Janus (1.3B), but at larger scale with better performance. See also JanusFlow for the rectified flow variant.
Dependencies¶
The model environment is managed via the janus_pro image defined in modal/images.py. For local setup, install the dependencies listed in model/Janus/requirements.txt.
Flash Attention (required)¶
Janus-Pro requires Flash Attention (v2.7.4). The Modal image already includes it. For local setup, install a pre-compiled wheel matching your environment — see modal/README.md for the exact environment parameters and installation instructions.
Architecture Note¶
Janus-Pro uses parallel generation: a single forward pass produces multiple images simultaneously (parallel_size=4). Generation uses classifier-free guidance (cfg_weight=5.0) with 576 image tokens per image.
Inference¶
CLI¶
# Generation
PYTHONPATH=src python -m umm.cli.main infer --config configs/inference/janus_pro_generation.yaml
# Understanding
PYTHONPATH=src python -m umm.cli.main infer --config configs/inference/janus_pro_understanding.yaml
Python API¶
from umm.inference.pipeline import InferencePipeline
from umm.inference.multimodal_inputs import InferenceRequest
pipeline = InferencePipeline(backbone_name="janus_pro", backbone_cfg={
"model_path": "/path/to/janus_pro_weights",
"janus_root": "/path/to/model/Janus",
"seed": 42,
"torch_dtype": "bfloat16",
})
# Generation
result = pipeline.run(InferenceRequest(
backbone="janus_pro", task="generation",
prompt="A cat sitting on a rainbow",
))
# Understanding
result = pipeline.run(InferenceRequest(
backbone="janus_pro", task="understanding",
prompt="Describe this image",
images=["path/to/image.jpg"],
))
Supported Benchmarks¶
| Benchmark | Config |
|---|---|
| DPG Bench | configs/eval/dpg_bench/dpg_bench_janus_pro.yaml |
| GenEval | configs/eval/geneval/geneval_janus_pro.yaml |
| WISE | configs/eval/wise/wise_janus_pro.yaml |
| UEval | configs/eval/ueval/ueval_janus_pro.yaml |
| Uni-MMMU | configs/eval/uni_mmmu/uni_mmmu_janus_pro.yaml |
| MME | configs/eval/mme/mme_janus_pro.yaml |
| MMMU | configs/eval/mmmu/mmmu_janus_pro.yaml |
| MMBench | configs/eval/mmbench/mmbench_janus_pro.yaml |
| MM-Vet | configs/eval/mmvet/mmvet_janus_pro.yaml |
| MathVista | configs/eval/mathvista/mathvista_janus_pro.yaml |
# Example: run GenEval
PYTHONPATH=src python -m umm.cli.main eval --config configs/eval/geneval/geneval_janus_pro.yaml
# Example: run UEval
PYTHONPATH=src python -m umm.cli.main eval --config configs/eval/ueval/ueval_janus_pro.yaml
Key Configuration Parameters¶
- Generation:
seed,torch_dtype(cfg_weight=5.0 and parallel_size=4 are model defaults) - Understanding: uses
VLChatProcessorfor image preprocessing; returnstextandsft_format