Machine learning analysis of coagulation-related genes for breast cancer diagnosis and prognosis prediction.
Shujin Li, Shuyan Liu, Yiwen Zheng, Weimin Hong, Yaoqiang Du, Xiaozhen Liu, Hongchao Tang, Xuli Meng, Qinghui Zheng
Abstract
Open AccessThe purpose of this study was to investigate the relationship between coagulation related genes (CRGs) and breast cancer (BC). First, we found that most CRGs are abnormally expressed in BC patients and correlated with their prognosis. Therefore, we explored the expression of CRGs in benign and malignant breast tissues in the Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx), extracted differentially expressed CRGs, and established an artificial neural network (ANN) diagnostic model to distinguish the nature of breast tissues, as well as a risk scoring model for prognostic assessment and risk stratification. The specimen transcriptomic data we provided confirmed the diagnostic performance of the ANN model described above. For the risk score model, we used internal and external validation, using ROC curves and C-index values to test its predictive value in the TCGA and Gene Expression Omnibus (GEO) cohorts, and further established a prognostic nomogram for clinical application. In addition, we evaluated the performance of diagnostic and prognostic models using 3 cross-validations methods. RABIF was further identified as a core gene. We performed a more detailed study of RABIF: RT-qPCR of BC cell lines and immunohistochemical staining (IHC) of breast tissue samples showed that RABIF is highly expressed in BC especially in advanced BC. Our study demonstrates the value of CRGs as diagnostic and prognostic targets and may contribute to clinical decision-making in BC.