Transformer-enhanced vertebrae segmentation and anatomical variation recognition from CT images.
Chao Yang, Lianghua Huang, Wiraphong Sucharit, Haorong Xie, Xingyu Huang, Ying Li
Abstract
Open AccessAccurate segmentation and anatomical classification of vertebrae in spinal CT scans are crucial for clinical diagnosis, surgical planning, and disease monitoring. However, the task is complicated by anatomical variability, degenerative changes, and the presence of rare vertebral anomalies. In this study, we propose a hybrid framework that combines a high-resolution WNet segmentation backbone with a Vision Transformer (ViT)-based classification module to perform vertebral identification and anomaly detection. Our model incorporates an attention-based anatomical variation module and leverages patient-specific metadata (age, sex, vertebral distribution) to improve the accuracy and personalization of vertebrae typing. Extensive experiments on the VerSe 2019 and 2020 datasets demonstrate that our approach outperforms state-of-the-art baselines such as nnUNet and SwinUNet, especially in detecting transitional vertebrae (e.g., T13, L6) and modeling morphological diversity. The system maintains high robustness under slice skipping, noise perturbation, and scanner variations, while offering interpretability through attention heatmaps and case-specific alerts. Our findings suggest that integrating anatomical priors and demographic context into transformer-based pipelines is a promising direction for personalized, intelligent spinal image analysis.