The repertoire of short tandem repeats across the tree of life.
Nikol Chantzi, Ilias Georgakopoulos-Soares
Abstract
Open AccessBACKGROUND: Short tandem repeats (STRs) are widespread, dynamic repetitive elements with a number of biological functions and relevance to human diseases, genome plasticity and adaptation. However, their prevalence across taxa remains poorly characterized. RESULTS: Here, we examined the prevalence and distribution of STRs across the complete genomes of 117,861 organisms spanning the tree of life. We find that there are large differences in the frequencies of STRs between organismal genomes, and these differences are largely driven by the taxonomic group an organism belongs to. Using simulated genomes, we find that on average, there is no enrichment of STRs in bacterial and archaeal genomes, suggesting that these genomes are not particularly repetitive. In contrast, we find that eukaryotic genomes are orders of magnitude more repetitive than expected. STRs are preferentially located at functional loci in specific taxa. Finally, we utilize the recently completed Telomere-to-Telomere genomes of human and other great apes, and find that STRs are highly abundant and variable between primate species, particularly in peri/centromeric regions. CONCLUSIONS: We conclude that STRs have expanded in eukaryotic and viral lineages and not in archaea or bacteria, resulting in large discrepancies in genomic composition.