Improved estimation of intrinsic solubility of drug-like molecules through multi-task graph transformer.
Jiaxi Zhao, Eline Hermans, Kia Sepassi, Christophe Tistaert, Christel A S Bergström, Mazen Ahmad, Per Larsson
Abstract
Open AccessAqueous solubility of a compound plays a crucial role throughout various stages of drug discovery and development. Despite numerous efforts using various machine learning models, accurately estimating aqueous solubility remains a challenge. One primary limitation is the absence of a single source, large dataset of druglike compounds for model training. Additionally, studies have highlighted the need for improvements in prediction algorithms and molecular representations. To address these challenges, the Johnson and Johnson (J&J) in-house solubility data was leveraged. Theoretical pH-solubility equations and in-house pKa prediction tools were utilized to calculate intrinsic solubility from J&J data. A multi-task graph transformer model was developed and trained on the calculated intrinsic solubility data of 13,306 compounds along with seven relevant physicochemical properties including solubility at pH 2/7, logP, and logD at three different pHs. When evaluated making use of high-quality test data, the developed model achieved a root mean square error (RMSE) of 0.61 and coefficient of determination (R2) of 0.60, demonstrating state-of-the-art performance in estimating intrinsic solubility for drug-like compounds.