Establishment and validation of a diagnostic model for cholangiocarcinoma based on LightGBM machine-learning algorithm.
Zeyu Zhang, Xueyan Geng, Maopeng Yin, Yongyuan Liang, Guixi Zheng
Abstract
Open AccessThe escalating annual death toll attributed to Cholangiocarcinoma (CCA) is, in part, a consequence of delayed diagnosis. This study developed an optimal CCA diagnostic model through the application of 11 machine-learning algorithms. Initially, 105 differentially expressed genes (DEGs) were identified by analyzing gene expression profiles from 307 CCA tumor tissues and 124 adjacent non-tumor tissues. WGCNA, F-test, characteristic importance, and Lasso regression analysis were employed to identify key DEGs, including APOF, DIO1, APOM, and OTC. Subsequently, diagnostic models were constructed based on APOF, DIO1, and OTC using 11 machine-learning algorithms. The LightGBM algorithm was determined as the optimal model through ROC curve analysis and machine learning performance evaluation, achieving an AUC of 0.84, with accuracy, precision, and recall values of 0.80, 0.83, and 0.90, respectively. Subsequent analyses included gene enrichment, protein-protein interaction (PPI), and CCA-related drug assessments. Additionally, the study revealed an imbalance in immune cell infiltration in CCA and identified CCL16 as a chemokine involved in immunoregulation. RT-qPCR confirmed that APOF, DIO1, and OTC were significantly downregulated in CCA tumor tissues. In conclusion, this research provides new directions for the diagnosis and immunotherapy of this disease.