The role of graph topology in the performance of biomedical Knowledge Graph Completion models.
Alberto Cattaneo, Stephen Bonner, Thomas Martynec, Edward Morrissey, Carlo Luschi, Ian P Barrett, Daniel Justus
Abstract
Open AccessMOTIVATION: Knowledge Graph Completion has been increasingly adopted as a useful method for helping address several tasks in biomedical research, such as drug repurposing or drug-target identification. To that end, a variety of datasets and Knowledge Graph Embedding models have been proposed over the years. However, little is known about the properties that render a dataset, and associated modelling choices, useful for a given task. Moreover, even though theoretical properties of Knowledge Graph Embedding models are well understood, their practical utility in this field remains controversial. RESULTS: In this work, we conduct a comprehensive investigation into the topological properties of publicly available biomedical Knowledge Graphs and establish links to the accuracy observed in real-world tasks. By releasing all model predictions and a new suite of analysis tools we invite the community to build upon our work and continue improving the understanding of these crucial applications. AVAILABILITY AND IMPLEMENTATION: The code used to perform experiments and analyze results in this article as well as all experimental data is available at https://github.com/graphcore-research/kg-topology-toolbox/tree/main/the_role_of_graph_topology_paper and archived on Zenodo, at https://doi.org/10.5281/zenodo.12097376.