🚀 New models by Bria.ai, generate and edit images at scale 🚀
deepseek-ai/
$0.0005
/ image
Janus-Pro is a novel autoregressive framework that unifies multimodal understanding and generation. It addresses the limitations of previous approaches by decoupling visual encoding into separate pathways, while still utilizing a single, unified transformer architecture for processing. The decoupling not only alleviates the conflict between the visual encoder’s roles in understanding and generation, but also enhances the framework’s flexibility. Janus-Pro surpasses previous unified model and matches or exceeds the performance of task-specific models. The simplicity, high flexibility, and effectiveness of Janus-Pro make it a strong candidate for next-generation unified multimodal models.
Please upload an image file
You need to login to use this model
LoginSettings
Question
Question about the provided image
Seed
Random seed for reproducibility, default is random (Default: empty, 0 ≤ seed < 18446744073709552000)
Top P
Top-p sampling parameter, higher values increase diversity (Default: 0.95, 0 ≤ top_p ≤ 1)
Temperature
Temperature parameter, higher values increase randomness (Default: 0.1, 0 ≤ temperature ≤ 1)
license: mit license_name: deepseek license_link: LICENSE pipeline_tag: any-to-any library_name: transformers tags:
Janus-Pro is a novel autoregressive framework that unifies multimodal understanding and generation. It addresses the limitations of previous approaches by decoupling visual encoding into separate pathways, while still utilizing a single, unified transformer architecture for processing. The decoupling not only alleviates the conflict between the visual encoder’s roles in understanding and generation, but also enhances the framework’s flexibility. Janus-Pro surpasses previous unified model and matches or exceeds the performance of task-specific models. The simplicity, high flexibility, and effectiveness of Janus-Pro make it a strong candidate for next-generation unified multimodal models.
Janus-Pro is a unified understanding and generation MLLM, which decouples visual encoding for multimodal understanding and generation. Janus-Pro is constructed based on the DeepSeek-LLM-1.5b-base/DeepSeek-LLM-7b-base.
For multimodal understanding, it uses the SigLIP-L as the vision encoder, which supports 384 x 384 image input. For image generation, Janus-Pro uses the tokenizer from here with a downsample rate of 16.
Please refer to Github Repository
This code repository is licensed under the MIT License. The use of Janus-Pro models is subject to DeepSeek Model License.
@article{chen2025janus,
title={Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling},
author={Chen, Xiaokang and Wu, Zhiyu and Liu, Xingchao and Pan, Zizheng and Liu, Wen and Xie, Zhenda and Yu, Xingkai and Ruan, Chong},
journal={arXiv preprint arXiv:2501.17811},
year={2025}
}
If you have any questions, please raise an issue or contact us at service@deepseek.com.
© 2025 Deep Infra. All rights reserved.