Mechanism-Aware Deep Learning for Polar Reaction Prediction.
Ryan J Miller, Alexander E Dashuta, Brayden Rudisill, David Van Vranken, Pierre Baldi
Abstract
Open AccessAccurately predicting chemical reactions is essential for driving innovation in synthetic chemistry, with broad applications in medicine, manufacturing, and agriculture. Yet reaction prediction remains a complex problem that is both time-consuming and resource-intensive for chemists to solve. Deep learning offers an appealing solution by enabling high-throughput prediction, but most existing models are trained on the US Patent Office data set and treat reactions as recipes or overall transformations─mapping reactants directly to products with limited mechanistic insight. To address this, we introduce PMechRP (Polar Mechanistic Reaction Predictor), trained on the PMechDB data set of polar elementary steps that capture electron flow and mechanistic detail. To broaden coverage and improve generalization, we augment PMechDB with combinatorially generated reactions and train models spanning transformer, graph, and two-stage Siamese architectures. In addition to reaction prediction models, we also develop ArrowFinder, a new model that directly predicts arrow-pushing mechanisms for a set of reactants and products. Our best-performing approach is a hybrid pipeline that combines an ensemble of Chemformer models with a two-stage Siamese framework, leveraging the accuracy of transformers while filtering away "alchemical" products using the two-step network and generating mechanistic annotations using ArrowFinder. This approach achieves strong predictive accuracy while also providing interpretable predictions. We evaluate performance across multiple benchmarks: PMechDB test splits, a curated USPTO subset from the Open Reaction Database, and a human benchmark of mechanistic pathways from an intermediate-level textbook.