dbCAN-HGM: CAZyme gene clusters in gut microbiomes of diverse human populations.
Yuchen Yan, Revanth Sai Kumar Reddy Patel, N R Siva Shanmugam, Jerry Akresi, Yanbin Yin
Abstract
Open AccessCAZymes (Carbohydrate Active EnZymes) play key metabolic functions in human gut microbiomes (HGM). Genes of glycan degrading CAZymes often form physically linked CAZyme Gene Clusters (CGCs) in gut bacterial genomes. Here we developed dbCAN-HGM (https://pro.unl.edu/dbCAN_HGM), a comprehensive data repository for human gut bacterial CGCs and CAZymes. dbCAN-HGM has the following unique features: (i) 121 883 CGCs are identified in 6031 high-quality species-level representative metagenome assembled genomes (MAGs), from a wide range of human populations, especially the under-studied African population; (ii) Each CGC page includes metagenomic read mapping results from different diets (vegan, vegetarian, omnivore, flexitarian) and disease statuses (ulcerative colitis [UC and Crohns disease), with interactive coverage plot and Jbrowse alignment tracks; (iii) CGCs are clustered with 1358 polysaccharide utilization loci into CGC families (CGC-Fs) to infer glycan substrates; (iv) Metadata and visualization are available for CGC-Fs by substrate, taxonomy, host geographic distribution, and top abundant CAZyme families; (v) CGCs are fully annotated with CAZymes, transporters, signal transduction proteins, transcriptional factors, sulfatases, peptidases, Pfam families, and protein 3D structure comparison results for unannotated proteins; and (vi) User-friendly and highly interactive web interface is provided for easy browsing and downloading of HGM genomes, CGCs, CGC-Fs by glycan substrates and continents.