Comparative Study of Molecular Descriptors and AI-Based Embeddings for Toxicity Prediction.
Magnus Gray, Leihong Wu
Abstract
Open AccessAccurate toxicity prediction is a critical component of pharmaceutical development and regulatory safety evaluation, traditionally relying on molecular descriptor-based models. This study compares the performance of descriptor-based features (Mordred, RDKit) with embeddings from ten AI language models applied to SMILES strings, chemical names, and simple descriptions, using logistic regression classifiers across the Tox21, ClinTox, and DILIst datasets. For the Tox21 dataset, Mordred achieved the highest average ROC-AUC of 0.855, outperforming language models. However, on specific endpoints, language models showed competitive performance, with MolBERT reaching an average ROC-AUC of 0.801 for SMILES-based embeddings. In contrast, language models outperformed descriptor models on the ClinTox dataset. While RDKit achieved an ROC-AUC of 0.721, GPT-3 reached 0.996 by using simple descriptions. Similarly, for the DILIst dataset, language models surpassed descriptor models, with GPT-3 achieving an ROC-AUC of 0.806 using chemical names, compared to RDKit's 0.620. These results demonstrate the promise of AI language models in predictive toxicology, particularly for specific toxicity endpoints and datasets. While molecular descriptors remain robust for multiendpoint predictions like Tox21, language models show superior performance on focused toxicity classifications such as ClinTox and DILIst. This study supports the future integration of molecular descriptors with textual embeddings to enhance overall performance and adaptability across diverse toxicity prediction tasks.