International Symposium on String Processing and Information Retrieval : SPIRE ... : proceedings. SPIRE (Symposium)
Prefix-free parsing for merging big BWTs.
Diego Díaz-Domínguez, Travis Gagie, Veronica Guerrini, Ben Langmead, Zsuzsanna Lipták, Giovanni Manzini, Francesco Masillo, Vikram Shivakumar
Published: 202610.1007/978-3-032-05228-5_6
Abstract
Open AccessWhen building Burrows-Wheeler Transforms (BWTs) of truly huge datasets, prefix-free parsing (PFP) can use an unreasonable amount of memory. In this paper we show how if a dataset can be broken down into small datasets that are not very similar to each other - such as collections of many copies of genomes of each of several species, or collections of many copies of each of the human chromosomes - then we can drastically reduce PFP's memory footprint by building the BWTs of the small datasets and then merging them into the BWT of the whole dataset.