Su Taojunfeng, Fellers Ryan T, Greer Joseph B, LeDuc Richard D, Thomas Paul M, Kelleher Neil L
Department of Molecular Biosciences, Northwestern University, Evanston, Illinois 60208, United States.
Proteomics Center of Excellence, Chemistry of Life Processes Institute, Northwestern University, 4605 Silverman Hall, 2170 Campus Drive, Evanston, Illinois 60208, United States.
J Proteome Res. 2025 Apr 4;24(4):1861-1870. doi: 10.1021/acs.jproteome.4c00943. Epub 2025 Mar 10.
Proteoforms are distinct molecular forms of proteins that act as building blocks of organisms, with post-translational modifications (PTMs) being one of the key changes that generate these variations. Mass spectrometry (MS)-based top-down proteomics (TDP) is the leading technology for proteoform identification due to its preservation of intact proteoforms for analysis, making it well-suited for comprehensive PTM characterization. A crucial step in TDP is searching MS data against a database of candidate proteoforms. To extend the reach of TDP to organisms with limited PTM annotations, we developed Proteoform-predictor, an open-source tool that integrates homology-based PTM site prediction into proteoform database creation. The new tool creates databases of proteoform candidates after registration of homologous sequences, transferring PTM sites from well-characterized species to those with less comprehensive proteomic data. Our tool features a user-friendly interface and intuitive workflow, making it accessible to a wide range of researchers. We demonstrate that Proteoform-predictor expands proteoform databases with tens of thousands of proteoforms for three bacterial strains by comparing them to the reference proteome of () K12. Subsequent TDP analysis for () and () demonstrated significant improvement in protein and proteoform identification, even for proteins with variant sequences. As TDP technology advances, Proteoform-predictor will become an important tool for expanding the applicability of proteoform identification and PTM biology to more diverse species across the phylogenetic tree of life.
蛋白质异构体是蛋白质的不同分子形式,作为生物体的组成部分,翻译后修饰(PTM)是产生这些变异的关键变化之一。基于质谱(MS)的自上而下蛋白质组学(TDP)是蛋白质异构体鉴定的领先技术,因为它能保留完整的蛋白质异构体进行分析,非常适合全面的PTM表征。TDP中的一个关键步骤是将质谱数据与候选蛋白质异构体数据库进行比对。为了将TDP的应用范围扩展到PTM注释有限的生物体,我们开发了Proteoform-predictor,这是一个开源工具,将基于同源性的PTM位点预测整合到蛋白质异构体数据库创建中。该新工具在注册同源序列后创建蛋白质异构体候选数据库,将特征明确的物种的PTM位点转移到蛋白质组数据不太全面的物种。我们的工具具有用户友好的界面和直观的工作流程,使广大研究人员都能使用。通过将三种细菌菌株与()K12的参考蛋白质组进行比较,我们证明Proteoform-predictor为这三种细菌菌株扩展了包含数万个蛋白质异构体的蛋白质异构体数据库。随后对()和()的TDP分析表明,即使对于具有变体序列的蛋白质,蛋白质和蛋白质异构体的鉴定也有显著改善。随着TDP技术的发展,Proteoform-predictor将成为一种重要工具,用于将蛋白质异构体鉴定和PTM生物学的适用性扩展到生命系统发育树中更多样化的物种。