Generative AI for developing foundation models in radiology and imaging: engineering perspectives.
June-Goo Lee, Sunggu Kyung, Namkug Kim
Abstract
Open AccessRecent advances in generative artificial intelligence (AI) have accelerated the development of foundation models-large-scale, pre-trained systems capable of learning across modalities and tasks with minimal supervision. In the radiology domain, where annotated data are limited and heterogeneous, generative AI plays a critical role not only in enabling self-supervised learning and synthetic data generation, but also in addressing core engineering challenges such as scalability, multimodal alignment, and data diversity. This review examines how generative models-ranging from VAEs to diffusion and autoregressive frameworks-serve as both the algorithmic and architectural backbone of medical foundation models. We explore hybrid designs that optimize sample quality, efficiency, and control, alongside representation learning techniques like masked autoencoding and contrastive learning. Further, we describe the design and training strategies of multimodal large language models (MLLMs), which integrate visual, textual, and structured clinical data for applications including report generation, segmentation, and clinical reasoning. Through case studies of models such as Med-CLIP, RetFound, M3D-LaMed, and Med-Gemini, we illustrate how generative AI enables scalable, adaptable, and privacy-conscious AI systems in medicine. Finally, we discuss ongoing challenges-hallucination, generalization, and regulatory constraints-and highlight future directions for engineering trustworthy and deployable medical AI infrastructures.