IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Vision Enhancing LLMs: Empowering Multimodal Knowledge Storage and Sharing in LLMs.
Yunxin Li, Zhenyu Liu, Baotian Hu, Wei Wang, Yuxin Ding, Xiaochun Cao, Min Zhang
Published: 202610.1109/TIP.2025.3649356
Abstract
Recent advancements in multimodal large language models (MLLMs) have achieved significant multimodal generation capabilities, akin to GPT-4. These models predominantly map visual information into language representation space, leveraging the vast kno…
Preview only. Read the full abstract at the source