bioRxiv : the preprint server for biology
Simulating Tandem Mass Spectra for Small Molecules using a General-Purpose Large-Language Model.
Tuan Nguyen, Dinesh Barupal
Published: 202510.1101/2025.11.10.687298
Abstract
Open AccessWe show a practical application of the Google Gemini large-language-model for simulating tandem mass spectra for compounds from the Blood Exposome Database. This approach bypasses the need for domain-specific model training, suggesting that the chemical fragmentation knowledge could be latently encoded within the Gemini model. General-purpose LLMs represent a useful and accessible tool for expanding in-silico spectral libraries and may accelerate the compound annotation in mass spectrometry-based metabolomics and exposomics.