NOCTIS: open-source toolkit that turns reaction data into actionable graph networks.
Nataliya Lopanitsyna, Marta Pasquini, Marco Stenta
Abstract
Open AccessBACKGROUND: Chemical reactions form densely connected networks, and exploring these networks is essential for designing efficient and sustainable synthetic routes. As reaction data from literature, patents, and high-throughput experimentation continue to grow, so does the need for tools that can navigate and mine these large-scale datasets. Graph-based representations capture the topology of reaction space, yet few open-source tools exist for building and querying such networks. To address this, we developed NOCTIS, an open-source toolkit for constructing and analyzing reaction data as graphs. RESULTS: NOCTIS is an open-source Python package for building Networks of Organic Chemistry (NOCs) from reaction strings. It supports graph-based analysis, parallel processing of large datasets, and export to common Python formats (e.g., NetworkX, pandas). Built on Neo4j technology, it features a modular, extensible architecture with open-source dependencies. We also provide a companion plugin for exhaustive route enumeration. It traverses graph-encoded reactions to assemble all valid synthetic routes, helping prevent redundant exploration and supporting knowledge reuse in synthesis planning. The underlying algorithm is documented in detail along with its current limitations. Using the MIT USPTO-480k dataset (Adv Neural Inf Process Syst 30, 2017), we demonstrate the plugin's route mining capabilities, analyze network connectivity, and assess synthetic trees. CONCLUSION: Built on LinChemIn (J Chem Inf Model 64(6):1765-1771, 2024), NOCTIS serves as an open and extensible toolkit for network-based reaction analysis and route mining, laying the groundwork for data-driven route design at scale. Future work will extend query capabilities and improve the efficiency of route extraction.