Enhancing Glaucoma Diagnosis Through Multi-Layer Transformer and Multi-Modal Feature Fusion.
Dongyang Zhao, Huihui Fang, Qi Gao, Yi Shi, Lixin Duan, Yanwu Xu
Abstract
Open AccessPurpose: To develop a more accurate glaucoma grading framework by combining multiple examination modalities, aiming to overcome the limitations of single-modality diagnostic systems for comprehensive glaucoma diagnosis. Methods: This paper proposes a novel multi-modal-based glaucoma grading framework to classify healthy, mild glaucoma, and moderate-to-severe glaucoma patients. The method simulates the clinical diagnosis process by leveraging multiple examination modalities and integrating prior knowledge of ocular structure to enhance feature learning. A multi-modal feature fusion framework (M2F3) is developed, utilizing a multi-layer transformer (MLT) for efficient combination of modalities. A contrastive learning strategy is also employed to improve feature learning further. Results: Experimental results demonstrated that the proposed M2F3 glaucoma grading method shows a substantial 0.0465 increase in Cohen's kappa (κ) coefficient compared to state-of-the-art (SOTA) methods on the Glaucoma grAding from Multi-Modality imAges (GAMMA) dataset. Conclusions: The proposed multi-modal-based glaucoma grading framework offers a more accurate diagnostic tool by integrating multiple examination modalities and prior knowledge, representing a substantial improvement over existing single-modality-based systems.