Janus¶
Original unified multimodal model from DeepSeek with decoupled visual encoding for understanding and generation.
- Original repository: https://github.com/deepseek-ai/Janus
- Paper: Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
- Backbone key:
janus_pro - Capabilities: Understanding, Generation (NO Editing)
Architecture¶
Janus uses separate vision encoders for understanding and generation tasks, unified through a shared LLM backbone. Key design:
- Understanding encoder: SigLIP vision encoder
- Generation: Autoregressive VQ token prediction (576 discrete tokens per image)
- LLM base: DeepSeek-LLM-1.3B
Supported Variants¶
| Variant | HuggingFace | Parameters |
|---|---|---|
| Janus-1.3B | deepseek-ai/Janus-1.3B |
1.3B |
Relationship to Janus-Pro and JanusFlow¶
All three models share the same repository but differ in architecture and scale:
| Aspect | Janus | Janus-Pro | JanusFlow |
|---|---|---|---|
| Generation method | VQ autoregressive | VQ autoregressive | Rectified flow ODE |
| Parameters | 1.3B | 7B | 1.3B |
| Image tokens | 576 discrete | 576 discrete | Continuous (30 steps) |
| External VAE | No | No | SDXL VAE |
Janus and Janus-Pro share the same backbone adapter (janus_pro). Switch between them by changing model_path.
Dependencies¶
The model environment is managed via the janus_pro image defined in modal/images.py. For local setup, install the dependencies listed in model/Janus/requirements.txt.
Inference¶
Python API¶
from umm.inference.pipeline import InferencePipeline
from umm.inference.multimodal_inputs import InferenceRequest
pipeline = InferencePipeline(backbone_name="janus_pro", backbone_cfg={
"model_path": "/path/to/Janus-1.3B",
"janus_root": "/path/to/model/Janus",
"seed": 42,
"torch_dtype": "bfloat16",
})
# Generation
result = pipeline.run(InferenceRequest(
backbone="janus_pro", task="generation",
prompt="A cat sitting on a rainbow",
))
# Understanding
result = pipeline.run(InferenceRequest(
backbone="janus_pro", task="understanding",
prompt="Describe this image",
images=["path/to/image.jpg"],
))
Supported Benchmarks¶
Same configs as Janus-Pro — change model_path to Janus-1.3B:
| Benchmark | Config |
|---|---|
| DPG Bench | configs/eval/dpg_bench/dpg_bench_janus_pro.yaml |
| GenEval | configs/eval/geneval/geneval_janus_pro.yaml |
| WISE | configs/eval/wise/wise_janus_pro.yaml |
| UEval | configs/eval/ueval/ueval_janus_pro.yaml |
| Uni-MMMU | configs/eval/uni_mmmu/uni_mmmu_janus_pro.yaml |
| MME | configs/eval/mme/mme_janus_pro.yaml |
| MMMU | configs/eval/mmmu/mmmu_janus_pro.yaml |
| MMBench | configs/eval/mmbench/mmbench_janus_pro.yaml |
| MM-Vet | configs/eval/mmvet/mmvet_janus_pro.yaml |
| MathVista | configs/eval/mathvista/mathvista_janus_pro.yaml |
Key Configuration Parameters¶
- Generation:
seed,torch_dtype(cfg_weight=5.0, parallel_size=4 are model defaults) - Understanding: uses
VLChatProcessorfor image preprocessing