Efficient Learning of Molecular Properties Using Graph Neural Networks Enhanced with Chemistry Knowledge.
Tetiana Lutchyn, Marie Mardal, Benjamin Ricaud
Abstract
Open AccessGraph neural networks (GNNs) have emerged as a powerful tool in predicting molecular properties based on structural data. While GNNs excel at identifying local patterns within molecules, their ability to capture global properties remains limited due to inherent structural challenges, such as oversmoothing and their expressivity. We build a simple GNN-based model that integrates chemistry knowledge that GNNs may have difficulties to learn. We show that this combination greatly enhances the accuracy compared with the pure GNN approach. It is on part due to the state of the art (SOTA) of much larger models, including large foundation models, and it even outperforms them in some cases on several benchmarks. With a simple approach, this study highlights some limitations of GNNs and the crucial benefit of giving GNN models easy access to global information about the graph in the context of applications to chemistry. We focus on regression tasks at the molecular level, on small-molecule data sets. We also investigated the possible localization of molecular substructures important for the GNN prediction using the SMILES encoding. We designed a GNN predicting molecule properties at the node level, allowing us to identify important nodes for the prediction. Additionally, the model's architecture allows for efficient training with relatively modest computational resources, making it practical for widespread application.