RAmpSim: A Thermodynamic Simulator for Hybridization Capture in Metagenomic Sequencing.
Aidan Zhang, Christina Boucher, Noelle Noyes, Yun William Yu
Abstract
Open AccessHybridization (bait) capture combined with long-read sequencing enables targeted profiling within complex metagenomes but introduces systematic biases from bait multiplicity, sequence composition, and species abundance that existing simulators ignore. We present RAmpSim, a fast simulator that models bait-target hybridization and fragment capture using a thermodynamic nearest-neighbor energy model and Boltzmann-weighted sampling of binding sites. Fragments are generated through multinomial sampling parameterized by bait concentration, binding energy, and genomic abundance before being passed to existing long-read simulators for modeling platform-specific errors. Implemented in Rust, RAmpSim reproduces empirical within-genome coverage and cross-species enrichment patterns observed in capture-based metagenomic datasets. Compared to uniform-coverage baselines, RAmpSim's simulated coverage distributions are up to an order of magnitude closer to real data with respect to earth mover's distance. Classification analysis reveals high recall in classifying high coverage regions between simulated and experimental distributions while outperforming a uniform baseline. Supporting accurate benchmarking and bait-set evaluation, RAmpSim provides an interpretable, efficient framework for simulating capture-based metagenomic sequencing.