Janus-Pro 是一个统一的理解和生成 MLLM，它将视觉编码解耦，以实现多模态理解和生成。Janus-Pro 是基于 DeepSeek-LLM-1.5b-base/DeepSeek-LLM-7b-base 构建的。

对于多模态理解，它使用SigLIP-L作为视觉编码器，支持 384 x 384 图像输入。对于图像生成，Janus-Pro 使用此处的标记器，下采样率为 16。

Janus-Pro is a unified understanding and generation MLLM, which decouples visual encoding for multimodal understanding and generation. Janus-Pro is constructed based on the DeepSeek-LLM-1.5b-base/DeepSeek-LLM-7b-base.

For multimodal understanding, it uses the SigLIP-L as the vision encoder, which supports 384 x 384 image input. For image generation, Janus-Pro uses the tokenizer from here with a downsample rate of 16.

DeepSeek Janus-Pro生图与反推
594
0
13

文生图

其他

DeepSeek Janus-Pro生图与反推 594013

文生图

其他

DeepSeek Janus-Pro生图与反推
594
0
13