Pantheon-DNA: Versatile encoding-decoding system with integrated adaptive NGS preprocessing algorithms for DNA data storage.
Adriano Galindo Leal, Thiago Yuji Aoyagi, André Guilherme Costa-Martins, Diego Trindade de Souza, Cristina Maria Ferreira da Silva, Eduardo Takeo Ueda, Marcelo Gonzaga de Oliveira Parada, Allan Eduardo Feitosa, André Fujita
Abstract
Open AccessWe introduce Pantheon-DNA, an end-to-end processing pipeline for DNA data storage that effectively addresses scalability challenges while efficiently managing large datasets, maintaining ≥99.996% retrievability at 10× coverage under both LER and HER in our tests. To prevent repetitive patterns in DNA sequences, which potentially cause chimeras at the molecular level and also hinder clustering algorithms, we propose a data arrangement scheme and a randomization procedure during encoding. We use block data architecture to enhance parallel processing and retrieval. The proposed sequencing data preprocessing pipeline utilizes prior knowledge of the data structure encoded in the DNA sequences to simplify conventional clustering routines and reduce computational complexity. The system's robustness and reliability are validated through an actual synthesis and sequencing experiment, which encodes and decodes 1.59 MB of data containing multiple files. Future enhancements will focus on refining error correction capabilities, particularly for indel recovery, as well as optimizing preprocessing efficiency and sensitivity.