IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

Vision Enhancing LLMs: Empowering Multimodal Knowledge Storage and Sharing in LLMs.

Yunxin Li, Zhenyu Liu, Baotian Hu, Wei Wang, Yuxin Ding, Xiaochun Cao, Min Zhang

Published: 202610.1109/TIP.2025.3649356

Abstract

Recent advancements in multimodal large language models (MLLMs) have achieved significant multimodal generation capabilities, akin to GPT-4. These models predominantly map visual information into language representation space, leveraging the vast kno…

Preview only. Read the full abstract at the source

View at DOI