Machine learning-based survival prediction in colorectal cancer combining clinical and biological features.
Lucas M Vieira, Natasha A N Jorge, João B Sousa, João C Setubal, Peter F Stadler, Maria E M T Walter
Abstract
Open AccessColorectal cancer (CRC) is one of the most common and lethal types of cancer worldwide. Understanding both the biological and clinical aspects of the patient is essential to uncover the mechanism underlying the prognosis of the disease. However, most current approaches focus primarily on clinical or biological elements, which can limit their ability to capture the full complexity of the prognosis of CRC. This study aims to enhance understanding of the mechanisms of CRC by combining clinical and biological data from CRC patients with machine learning techniques (ML) to explore the importance of features and predict patient survival. First, we performed differential expression analysis and inspected patient survival curves to identify relevant biological features. Then, we applied ML techniques to understand the individual impact of each clinical and biological feature on patient survival. E2F8, WDR77, and hsa-miR-495-3p stood out as biological features, while pathological stage, age, new tumor event, lymph node count, and chemotherapy have shown themselves as interesting clinical features. Furthermore, our ML model achieved an accuracy of 89.58% to predict patient survival. The clinical and biological features proposed here in conjunction with ML can improve the interpretation of CRC mechanisms and predict patient survival.