Image-text guided fundus vessel segmentation via attention mechanism and gated residual learning.
Jianguo Xu, Qingyou Liu, Jianxin Shen, Rong Tan, Sukun Tian, Wei Chi, Weihua Yang
Abstract
Open AccessBackground: Fundus vessel segmentation is crucial for the early diagnosis of ocular diseases. However, existing deep learning-based methods, although effective for detecting coarse vessels, still face challenges in segmenting fine vessels and heavily rely on time-consuming and labor-intensive pixel-level annotations. Methods: To alleviate these limitations, this study proposes an image-text guided segmentation model enhanced with the Squeeze-and-Excitation (SE) module and gated residual learning. Concretely, the multimodal fundus vessel datasets with text labels are primarily constructed, effectively supporting our pioneering effort to successfully introduce an image-text model into fundus vessel segmentation. Secondly, an improved image-text model is meticulously designed, focusing on the following two aspects: (1) embedding the SE module in the CNN backbone to adaptively recalibrate channel weights for enhanced vessel feature representation; (2) integrating gated residual learning into the ViT backbone to dynamically regulate the information flow between image and text features. Results: Extensive quantitative and qualitative experiments on two publicly available datasets, including DRIVE and ROSE-1, demonstrate that the proposed model achieves superior segmentation performance. Specifically, on the DRIVE dataset, the model attains an F1-score of 82.01%, an accuracy of 95.72%, a sensitivity of 83.25%, and a specificity of 97.43%. On the ROSE-1 dataset, the model records an F1-score of 86.34%, an accuracy of 94.61%, a sensitivity of 90.14%, and a specificity of 95.80%. Compared with most deep learning methods, these results reveal the competitiveness of the improved model, indicating its feasibility and potential value in fundus vessel segmentation, which is expected to expand a new research approach in this field.