An Efficient Vision Mamba-Transformer Hybrid Architecture for Abdominal Multi-Organ Image Segmentation.
Fang Lu, Jingyu Xu, Qinxiu Sun, Qiong Lou
Abstract
Open AccessAccurate abdominal multi-organ segmentation is essential for disease diagnosis and treatment planning. Although numerous deep-learning models have been proposed, current methods still struggle to balance segmentation accuracy with computational efficiency, particularly for images exhibiting inhomogeneous intensity distributions and complex anatomical structures. To address these challenges, we present a hybrid framework that integrates an Efficient Vision Mamba (EViM) module into a Transformer-based encoder. The EViM module leverages hidden-state mixer-based state-space duality to enable efficient global context modelling and channel-wise interactions. In addition, a weighted combination of cross-entropy and Jaccard loss is employed to improve boundary delineation. Experimental results on the Synapse dataset demonstrate that the proposed model achieves an average Dice score of 82.67% and an HD95 of 16.36 mm, outperforming current state-of-the-art methods. Further validation on the ACDC cardiac MR dataset confirms the generalizability of our approach across imaging modalities. The results indicate that the proposed framework achieves high segmentation accuracy while effectively integrating global and local information, offering a practical and robust solution for clinical abdominal multi-organ segmentation.