FIDDLE: a deep learning method for chemical formulas prediction from tandem mass spectra.
Yuhui Hong, Sujun Li, Yuzhen Ye, Haixu Tang
Abstract
Open AccessMolecular identification through tandem mass spectrometry is fundamental in small molecule analysis, with formula identification serving as an initial step in the process. Current computational methods often struggle with accuracy, speed, and scalability for relatively larger molecules, limiting high-throughput workflows. We present FIDDLE (Formula IDentification by Deep LEarning), a deep learning-based method trained on over 38,000 molecules and 1 million MS/MS spectra from various Quadrupole Time-of-Flight (Q-TOF) and Orbitrap instruments. FIDDLE accelerates formula identification by more than 10-fold and achieves top-1 and top-5 accuracies of 88.3% and 93.6%, respectively, outperforming state-of-the-art methods based on top-down (SIRIUS) and bottom-up (BUDDY) approaches by over 10%. On external metabolomics datasets, FIDDLE achieves top-5 accuracies of 75.1% (positive ion mode) and 66.2% (negative ion mode), with further improvements to 80.0% and 73.8% when combined with SIRIUS and BUDDY.