Integrative multi-omics and network-based machine learning for early diagnosis of Parkinson's disease.
Wei Liu, Lina Xu, Xuejing Wang, Jiuqi Wang
Abstract
Open AccessBACKGROUND: Accurate diagnosis of Parkinson's Disease (PD) remains challenging due to its biological complexity. Integrating machine learning with multi-omics and network topological analyses may enhance diagnostic precision. OBJECTIVE: To classify early-stage PD patients and healthy controls (HCs) by integrating multi-omics data and network-based machine learning. METHODS: We analyzed 305 participants from the PPMI cohort (213 PD, 92 HCs) using DNA methylation, gene expression, and proteomic data. Feature selection was conducted through sPLS-DA, followed by integration via DIABLO. Regulatory gene networks were constructed using STRING-based topology analysis. An XGBoost classifiers were trained and optimized on 244 samples and validated on 61 independent samples. Furthermore, we conducted external validation using data from 26 participants (12 PD, 14 HCs) in the GEO dataset, incorporating DNA methylation and gene expression profiles. RESULTS: DIABLO-based integration identified 56 CpG sites, 61 genes, and 70 proteins. Network topology analysis revealed 59 key regulators. Among three XGBoost models-based on multi-omics signatures, topological regulators, and their combination-the multi-omics model achieved the best test-set performance (AUC 0.72, 95%CI 0.51-0.85, accuracy 0.74, 95%CI 0.61-0.82). Within the validation set, the topological regulators model achieved the superior performance (AUC 0.57, 95%CI 0.37-0.73, accuracy 0.62, 95%CI 0.38-0.73). CONCLUSION: Combining machine learning with integrative multi-omics and network topology analysis enables effective biomarker identification and PD classification, with strong potential for clinical diagnostic applications.