Transfer learning from custom-tailored virtual molecular databases to real-world organic photosensitizers for catalytic activity prediction.
Naoki Noto, Taiki Nagano, Mikito Fujinami, Ryosuke Kojima, Susumu Saito
Abstract
Open AccessThe scarcity of experimental training data restricts the integration of machine learning into catalysis research. Here, we report on the effectiveness of graph convolutional network (GCN) models pretrained on a molecular topological index, which is not used in typical organic synthesis, for estimating the catalytic activity, a task that usually requires high levels of human expertise. For pretraining, we used custom-tailored virtual molecular databases that can be readily constructed using either a systematic generation method or a molecular generator developed in our group. Although 94%-99% of the employed virtual molecules are unregistered in the PubChem database, the resulting pretrained GCN models improve the prediction of catalytic activity for real-world organic photosensitizers. The results demonstrate the efficiency of the present transfer-learning strategy, which leverages readily obtainable information from self-generated virtual molecules.