Integrative genomic and machine learning approaches reveal evolutionary signatures in the winged bean mitochondrial genome.
Nikhil Kumar Singh, Binay K Singh, Piyush Kumar, Avinash Pandey, Sudhir Kumar, Sujit Kumar Bishi, A Pattanayak, V P Bhadana, Sujay Rakshit, Kishor U Tribhuvan
Abstract
Open AccessThe mitochondrial genome of Psophocarpus tetragonolobus (winged bean), a nutritionally valuable yet genomically underexplored tropical legume, was assembled using high-coverage PacBio long reads and Illumina short reads. The 366,925 bp circular genome encodes 64 genes (38 protein-coding, 20 tRNAs, 6 rRNAs) and contains nine fragmented protein-coding genes, indicative of dynamic mitogenome architecture. Repeat profiling revealed 100 dispersed repeats (30-110 bp) and 25 SSRs (4.95% of the genome), with assembly graph inspection and recombination models supporting subgenomic circles and isoforms. Comparative analyses across 15 legumes showed pervasive purifying selection, with positive selection in specific codons of atp4, ccmB, cox1, nad3, and rps10. Codon usage analyses showed that mitochondrial genes exhibit moderate bias largely shaped by mutational pressure, whereas chloroplast genes display stronger selective constraints. Synteny mapping revealed multiple conserved and inverted regions between organelles, highlighting structural divergence. To bridge structural and compositional insights, we developed a novel machine learning framework trained on 14 codon bias features that discriminate organelle origin with upto 0.96 AUC, identifying GC3s as the most informative predictor. This represents the first ML-based classification of plant organelle genomes and demonstrate that codon composition encodes an evolutionarily conserved "organelle signatures". This approach not only elucidates the evolutionary architecture of P. tetragonolobus mitogenome but also establishes a transferable model for organelle genome classification and comparative analysis across plants and other eukaryotic lineages.