Deep generative modeling captures maturation-dependent pairing patterns in human antibodies.
Lea Brönnimann, Thomas Lemmin, Chiara Rodella
Abstract
Open AccessUnderstanding antibody heavy-light chain pairing is critical for decoding immune repertoire architecture and designing therapeutic antibodies, yet most sequence databases lack paired chain information. To address this gap, we developed a two-stage deep learning framework. Transformer-based language models were first pre-trained on large corpora of unpaired heavy- and light-chain sequences, then integrated into a sequence-to-sequence model to generate light chains from heavy chain input. Although native light chain recovery was moderate, generated sequences exhibited high germline identity, improved structural quality, and broader framework and complementarity-determining region coverage. Heavy chains from memory B cells generated light chains with more restricted V gene usage, reflecting maturation-dependent selection. Generated κ light chains exhibited a trimodal similarity distribution, indicating distinct functional pairing modes from promiscuous to highly specific. Our approach demonstrates that sequence-to-sequence modeling can uncover inter-chain dependencies and generate plausible antibody pairs, providing a foundation for computational repertoire analysis and therapeutic design.