Properties, Origin, and Consistency of Truncated Proteoforms Across Top-Down Proteomic Studies.
Philipp T Kaulich, James M Fulcher, Andreas Tholey
Abstract
Open AccessProtein truncation is a common modification that can alter protein localization, interaction, activity, and function. Top-down proteomics targets the identification of all molecular forms in which a protein can exist (termed "proteoforms") and is thus well-suited for termini analysis. To examine the properties, origin, and consistency of truncated proteoforms, we performed a meta-analysis of 50 top-down proteomics datasets published over the past decade, covering 140,000 proteoforms derived from 14,500 proteins across various species. On average across all datasets, approximately 71% of proteoforms were truncated, with the vast majority not yet being documented in protein databases. Our analysis was able to distinguish between artificial truncations (e.g., sample preparation effects on labile peptide bonds) and endogenous truncations, enabling the identification of novel signal peptides and truncations between structured domains. This study highlights the importance of a common yet understudied mechanism for generating protein diversity and provides a valuable resource for future studies, targeting truncated proteoform functions or aiming to reduce artefacts in proteomics sample preparation.