Genome analysis and machine learning-based feature selection strategy reveal potential drug-resistance determinants in Nakaseomyces glabratus.
Qiqi Wang, Runhong Chen, Xin Cao, Hao Zhang, Jielin Yang, Xinlong Wang, Yadong Liu, Xinyu Tan, Tianyu Liang, Ruoyu Li, Zhe Wan, Yejun Wang, Wei Liu
Abstract
Open AccessInvasive candidiasis caused by Nakaseomyces glabratus is of great concern due to high morbidity and mortality, especially antifungal resistance. To identify genomic signatures, which significantly link to drug-resistance, is of great significance in combating this lethal disease. In this study, we performed whole genome analysis on 109 clinical strains of N. glabratus which had been isolated from multi-centres in China. By using genome-wide association studies (GWAS), genomic signatures, including several PDR1 mutations and genes encoding GLEYA-containing proteins, were identified to be significantly linked to drug-resistance. With the strategy of feature-selection combining machine-learning (ML), more relevant genomic signatures and potential resistance determinants were identified, including Y682C and I380L mutations in PDR1 which were further confirmed to confer triazole-resistance by gene editing technology. We believe that the ML-based feature selection (MLFS) strategy, which is based on a comprehensive understanding of genomic characteristics as described in this study, shows excellent capacity to predict resistance and potential resistance determinants in N. glabratus.