We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

🚀 New models by Bria.ai, generate and edit images at scale 🚀

deepseek-ai logo

deepseek-ai/

Janus-Pro-1B

$0.0005

/ image

Janus-Pro is a novel autoregressive framework that unifies multimodal understanding and generation. It addresses the limitations of previous approaches by decoupling visual encoding into separate pathways, while still utilizing a single, unified transformer architecture for processing. The decoupling not only alleviates the conflict between the visual encoder’s roles in understanding and generation, but also enhances the framework’s flexibility. Janus-Pro surpasses previous unified model and matches or exceeds the performance of task-specific models. The simplicity, high flexibility, and effectiveness of Janus-Pro make it a strong candidate for next-generation unified multimodal models.

deepseek-ai/Janus-Pro-1B cover image

Input

Please upload an image file

You need to login to use this model

Login

Settings

Question

Question about the provided image

Seed

Random seed for reproducibility, default is random (Default: empty, 0 ≤ seed < 18446744073709552000)

Top P

Top-p sampling parameter, higher values increase diversity (Default: 0.95, 0 ≤ top_p ≤ 1)

Temperature

Temperature parameter, higher values increase randomness (Default: 0.1, 0 ≤ temperature ≤ 1)

Output

Model Information

license: mit license_name: deepseek license_link: LICENSE pipeline_tag: any-to-any library_name: transformers tags:

  • muiltimodal
  • text-to-image
  • unified-model

1. Introduction

Janus-Pro is a novel autoregressive framework that unifies multimodal understanding and generation. It addresses the limitations of previous approaches by decoupling visual encoding into separate pathways, while still utilizing a single, unified transformer architecture for processing. The decoupling not only alleviates the conflict between the visual encoder’s roles in understanding and generation, but also enhances the framework’s flexibility. Janus-Pro surpasses previous unified model and matches or exceeds the performance of task-specific models. The simplicity, high flexibility, and effectiveness of Janus-Pro make it a strong candidate for next-generation unified multimodal models.

Github Repository

image
image

2. Model Summary

Janus-Pro is a unified understanding and generation MLLM, which decouples visual encoding for multimodal understanding and generation. Janus-Pro is constructed based on the DeepSeek-LLM-1.5b-base/DeepSeek-LLM-7b-base.

For multimodal understanding, it uses the SigLIP-L as the vision encoder, which supports 384 x 384 image input. For image generation, Janus-Pro uses the tokenizer from here with a downsample rate of 16.

3. Quick Start

Please refer to Github Repository

4. License

This code repository is licensed under the MIT License. The use of Janus-Pro models is subject to DeepSeek Model License.

5. Citation

@article{chen2025janus,
  title={Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling},
  author={Chen, Xiaokang and Wu, Zhiyu and Liu, Xingchao and Pan, Zizheng and Liu, Wen and Xie, Zhenda and Yu, Xingkai and Ruan, Chong},
  journal={arXiv preprint arXiv:2501.17811},
  year={2025}
}

6. Contact

If you have any questions, please raise an issue or contact us at service@deepseek.com.