Single-cell Tree-based Model for Genomic-Disease Association.
Zhikang Liu, Yiyang Niu, Tian Le, Daniel G Chen, Yapeng Su, Ye Zheng
Abstract
Open AccessThe rapid maturation of single-cell multi-omics technologies has enabled unprecedented resolution for mapping disease states and identifying disease-associated biomarkers. In practice, biomarkers are often discovered through differential detection that treat genomic features as independent contributors to phenotypes, while the combinatorial interactions that drive clinical outcomes remain a practical challenge. We present scanCT (single-cell analysis of Clinical Tree), a tree-based framework that identifies groups of genomic features associated with distinct disease phenotypes in a highly interpretable manner. scanCT uses an unbiased, model-based variable-selection procedure for data-driven split selection, which is important for handling the diverse distributional properties of single-cell data across modalities. The tree architecture captures feature interaction effects, and the association modeling enables adjustment for confounding factors. We apply scanCT to longitudinal single-cell multi-omics COVID-19 datasets spanning diverse clinical outcomes and multiple time points per patient. scanCT identifies phenotype-specific gene and protein markers while accounting for age and sex, and it reveals interpretable synergistic marker combinations that help explain differences in patient clinical phenotypes.