Benchmarking Peptide Spectral Library Search.
Hao Xu, Nuno Bandeira
Abstract
Open AccessSpectral library search (SLS) is a major approach for peptide identification from tandem mass spectrometry data, with performance depending substantially on the accuracy of the underlying Spectrum-Spectrum Matching (SSM) scoring functions. However, detailed comparative studies remain limited by the absence of comprehensive benchmark datasets. We propose new methods to build SSM scoring functions benchmarks and construct a benchmark dataset with (i) eight query spectrum sets with varying noise level for 476,063 precursors, and (ii) three spectral libraries with experimental, de-noised and predicted spectra for 3,065,819 precursors. We evaluate common spectrum preprocessing scenarios and SSM scoring functions, including SpectraST and EntropyScore. Results revealed this remains an important open problem, with the best recall for still assessed to be poor at just ∼70%, with SpectraST performing best for spectra with little-to-no noise, while JS-divergence showed superior noise resistance. Conversely, Cosine and Entropy score performed substantially worse, with Projected-Cosine performing especially poorly in most cases, with overall performance and relative ranking depending quite significantly on the minimum number of matching peaks. The benchmark dataset (MSV000095946/PXD056205) supports testing and development of new SSM scoring functions and the proposed benchmark construction approach provides an extensible foundation for additional types of SSM evaluation.