

Janus-Pro 是一个统一的理解和生成 MLLM,它将视觉编码解耦,以实现多模态理解和生成。Janus-Pro 是基于 DeepSeek-LLM-1.5b-base/DeepSeek-LLM-7b-base 构建的。
对于多模态理解,它使用SigLIP-L作为视觉编码器,支持 384 x 384 图像输入。对于图像生成,Janus-Pro 使用此处的标记器,下采样率为 16。
Janus-Pro is a unified understanding and generation MLLM, which decouples visual encoding for multimodal understanding and generation. Janus-Pro is constructed based on the DeepSeek-LLM-1.5b-base/DeepSeek-LLM-7b-base.
For multimodal understanding, it uses the SigLIP-L as the vision encoder, which supports 384 x 384 image input. For image generation, Janus-Pro uses the tokenizer from here with a downsample rate of 16.
Janus-Pro 是一个统一的理解和生成 MLLM,它将视觉编码解耦,以实现多模态理解和生成。Janus-Pro 是基于 DeepSeek-LLM-1.5b-base/DeepSeek-LLM-7b-base 构建的。
对于多模态理解,它使用SigLIP-L作为视觉编码器,支持 384 x 384 图像输入。对于图像生成,Janus-Pro 使用此处的标记器,下采样率为 16。
Janus-Pro is a unified understanding and generation MLLM, which decouples visual encoding for multimodal understanding and generation. Janus-Pro is constructed based on the DeepSeek-LLM-1.5b-base/DeepSeek-LLM-7b-base.
For multimodal understanding, it uses the SigLIP-L as the vision encoder, which supports 384 x 384 image input. For image generation, Janus-Pro uses the tokenizer from here with a downsample rate of 16.