CNAdjust: enhancing CNA calling accuracy through systematic baseline adjustment.
Hangjia Zhao, Michael Baudis
Abstract
Open AccessAccurate determination of the genomic copy number baseline is crucial for identifying copy number alterations (CNAs) in cancer, yet it remains a significant challenge in tumors with complex karyotypes. To address this, we present CNAdjust, an integrated method to systematically detect and correct baseline inaccuracies in CNA data. CNAdjust employs a Bayesian framework that integrates cohort-specific CNA frequency priors with a data-driven plausibility score, ensuring that adjusted calls align with both biological cohort patterns and study-specific data. Performance validation using the TCGA pan-cancer dataset demonstrated improved alignment with absolute copy number estimates and enhanced CNA pattern interpretation. Furthermore, we revealed a strong correlation between chromosomal aneuploidy and baseline abnormalities, underscoring the prevalence of this issue in cancer genomics. By systematically improving the precision of CNA calls, CNAdjust serves as a critical tool for constructing harmonized reference datasets and advancing the progress of precision oncology. Its implementation as a standard, portable workflow enables the reproducible and scalable analysis of large, heterogeneous datasets, supporting large-scale genomic research. Source codes are available at: https://github.com/baudisgroup/CNAdjust.