KSMoFinder-knowledge graph embedding of proteins and motifs for predicting kinases of human phosphosites.
Manju Anandakrishnan, Karen E Ross, Chuming Chen, K Vijay-Shanker, Cathy H Wu
Abstract
Open AccessMotivation: Protein kinases regulate cellular signaling pathways through a cascade of phosphorylation activity, selectively targeting specific residues on substrate proteins (phosphosites). Determining the characteristics of kinases that phosphorylate specific substrates have been extensively studied. Most tools utilize amino acid sequence motifs around phosphosites but don't consider substrate protein's biological characteristics. Results: We present KSMoFinder, a kinase-substrate-motif prediction model that learns factors beyond motif similarities by integrating proteins' biological contexts. We learn the semantics in a knowledge graph containing proteins' contextual relationships, kinase-specific motifs and motif composition, and represent the proteins and motifs as vectors. Using the representations as features, we train a supervised deep-learning classifier to identify kinase-phosphosite relationships. We use ground truth kinase-substrate-motif dataset from iPTMnet and PhosphositePlus and evaluate KSMoFinder's prediction performance. Pairwise comparative assessments with prior kinase-substrate prediction tools demonstrate KSMoFinder's superior performance. KSMoFinder trained using our knowledge graph embeddings surpasses the prediction performances using embeddings of popular protein language models such as ProtT5, ESM2, and ESM3 with a ROC-AUC of 0.851 and PR-AUC of 0.839 on a testing dataset with equal number of positives and negatives. Unlike most existing tools, KSMoFinder can be utilized to predict at the motif and at the substrate protein level. Availability and implementation: Source code is available at https://github.com/manju-anandakrishnan/KSMoFinder.