NifFinder: improved Nif protein prediction using SWeeP vectors and neural networks.
Bruno Thiago de Lima Nichio, Roxana Beatriz Ribeiro Chaves, Jeroniza Nunes Marchaukoski, Fabio de Oliveira Pedrosa, Roberto Tadeu Raittz
Abstract
Open AccessMotivation: Biological nitrogen fixation is a vital process for global ecosystems and agriculture; however, the diversity and complexity of nif genes present significant challenges for the accurate identification of Nif proteins. Existing computational tools are often limited to a narrow subset of nif genes, leaving many important protein classes unexplored. NifFinder was developed to address this gap, combining SWeeP vector representation with neural network models to predict up to 24 different Nif proteins. By expanding the predictive scope and improving accuracy, NifFinder provides a more comprehensive and reliable framework to study nitrogen fixation, supporting both evolutionary insights and applications in agricultural sustainability. Results: We present NifFinder, a computational framework that integrates SWeeP vector encoding with neural network classifiers to predict up to 24 different Nif protein classes across Archaea and Bacteria. NifFinder achieved an average accuracy of 84.31%, with sensitivity (86.49%), precision (81.97%), F1-score (82.33%), and a class correlation coefficient of 0.94. Benchmarking against Nif curated resources showed strong agreement and robust classification even under class imbalance. By expanding beyond traditional subsets of nif genes, NifFinder enables more reliable genome-wide identification of Nif proteins. Availability and implementation: The NifFinder installation instructions and source code can be accessed at https://sourceforge.net/projects/NifFinder.