cncFinder: A graph-attention-network-based interpretable learning model to identify bifunctional long non-coding RNAs.
Qiang Tang, Yang Yu, Min Shen, Lin Zhang, Xu Jia, Juanjuan Kang
Abstract
Open AccessCertain RNAs exhibit both protein-coding and regulatory non-coding functions, termed bifunctional RNAs or coding and non-coding RNAs. Long non-coding RNAs (lncRNAs), which play crucial roles in gene regulation and cellular processes, represent a major subset of bifunctional RNAs. Accurate identification of bifunctional lncRNAs is critical for advancing RNA biology and uncovering opportunities for biomarker discovery and therapeutic development. Here, we present cncFinder, a graph-attention-network-based model for predicting bifunctional lncRNAs. It transforms lncRNA sequences into k-mer graphs, encodes node features with Word2Vec, and employs graph attention network to capture higher-order sequence dependencies. On the testing dataset, cncFinder achieved superior performance, significantly outperforming state-of-the-art models. Its robustness and broad applicability were further confirmed through validation on cross-species datasets from mouse and fruit fly. Interpretability analysis revealed that cncFinder captured biologically meaningful motifs, including canonical start codons and Kozak-like elements. In a case study of LINC00961, cncFinder precisely detected an experimentally validated translation initiation motif, highlighting its biological relevance. To support broad accessibility, we developed a user-friendly web server. In summary, cncFinder advances predictive accuracy and interpretability, providing a powerful tool for systematic discovery of bifunctional lncRNAs and enabling new insights into RNA multifunctionality.