Long short-term memory-based deep learning model for the discovery of antimicrobial peptides targeting Mycobacterium tuberculosis.
Linfeng Wang, Susana Campino, Taane G Clark, Jody E Phelan
Abstract
Open AccessMotivation: Tuberculosis, caused by Mycobacterium tuberculosis, remains a global health challenge driven by rising antibiotic resistance. Antimicrobial peptides offer a promising alternative due to membrane-disruptive activity and low resistance potential, yet the scarcity of TB-specific AMP data constrains targeted development. We present a reproducible deep learning protocol that integrates long short-term memory networks with transfer learning to classify and generate TB-active peptides. Results: Classifiers were pretrained on a large corpus of general AMPs and fine-tuned on curated TB-specific sequences using frozen encoder and full backpropagation strategies. We benchmarked four model variants [unidirectional and bidirectional long short-term memories (LSTMs), with and without attention] on a held-out TB test set; the unidirectional LSTM with a frozen encoder achieved the best performance (accuracy 90%, AUC 0.97). In parallel, LSTM-based generative models were trained to produce de novo TB-active peptides. A generator trained exclusively on TB data produced 94 of 100 peptides predicted as antimicrobial by AMP Scanner, outperforming transfer learning-based generators. Generated peptides were evaluated for antimicrobial activity, toxicity, structure, and AMP-like physicochemical traits, and four candidates shared ≥84% identity with known TB-AMPs. Availability and implementation: The complete model and data can be found at: https://github.com/linfeng-wang/TB-AMP-design.