An overview of computational methods for gene prediction in eukaryotes: strengths, limitations, and future directions.
Abigail Djossou, Wend Yam D D Ouedraogo, Aida Ouangraoua
Abstract
Open AccessSummary: Advances in Next-Generation Sequencing (NGS) and machine-learning methods have improved eukaryotic gene prediction. Despite this progress, computational prediction remains crucial for complementing empirical data and annotating newly sequenced genomes, given the complexity of eukaryotic gene structures. Recent deep-learning approaches further enhance accuracy by learning gene-structure patterns directly from genomic sequences, enabling stronger cross-species generalization without predefined gene models. This review introduces a new classification of gene prediction methods-gene-model-based, gene-model-free, and hybrid-and examines representative tools with respect to their algorithmic strategies, input data, strengths, and limitations. It also updates previously reported challenges and outlines new issues arising from modern deep-learning techniques. To support these discussions, we extended the G3PO benchmark of gene-model-based predictors (Augustus, GenScan, GeneID, GlimmerHMM, and SNAP) to additionally include a gene-model-free method, sensor-NN, and a hybrid method, Helixer. Availability and implementation: Benchmark DNA and protein sequences are available in the G3PO repository (http://git.lbgi.fr/scalzitti/Benchmark_study). Scripts for Augustus and Helixer, along with all prediction outputs, are accessible at https://github.com/UdeS-CoBIUS/GenePredictionReviewBenchmark.