Dual-channel heterogeneous feature fusion neural network for the prediction of post-transcriptional gene expression in Escherichia coli.
Zihe Wang, Zilun Mei, Xiaogang Wang, Jinpeng Zhang, Zhenghong Xu, Fei Liu, Xiaojuan Zhang
Abstract
Open AccessThe sequences and structures of the 5' mRNA regions in prokaryotes significantly impact transcript stability and translation efficiency at the post-transcriptional level. However, the structure-function relationship of the post-transcription regulation region remains elusive. Here, we present a dual-channel neural network integrating sequence-structure features to predict post-transcriptional gene expression in Escherichia coli. Our model combines Word2Vec and K-mer encoding for initial sequence representation, followed by parallel feature extraction via CNN and BiLSTM. An attention mechanism dynamically prioritizes critical elements within both channels. The feature vectors from these modules are concatenated and fed into a fully connected network for final prediction. Evaluated on randomized train-test splits, the model demonstrates robust performance in classifying expression levels based on 5' mRNA regions and can identify post-transcriptional regulation regions with high translational efficiency, with accuracy reaching 93%. This framework provides a computational tool for optimizing synthetic biology designs by linking sequence architecture to expression outcomes, enhancing the efficiency of biological sequence design.