Schaffer Leah V, Millikin Robert J, Shortreed Michael R, Scalf Mark, Smith Lloyd M
Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States.
J Proteome Res. 2020 Aug 7;19(8):3510-3517. doi: 10.1021/acs.jproteome.0c00332. Epub 2020 Jul 10.
Cellular functions are performed by a vast and diverse set of proteoforms. Proteoforms are the specific forms of proteins produced as a result of genetic variations, RNA splicing, and post-translational modifications (PTMs). Top-down mass spectrometric analysis of intact proteins enables proteoform identification, including proteoforms derived from sequence cleavage events or harboring multiple PTMs. In contrast, bottom-up proteomics identifies peptides, which necessitates protein inference and does not yield proteoform identifications. We seek here to exploit the synergies between these two data types to improve the quality and depth of the overall proteomic analysis. To this end, we automated the large-scale integration of results from multiprotease bottom-up and top-down analyses in the software program Proteoform Suite and applied it to the analysis of proteoforms from the human Jurkat T lymphocyte cell line. We implemented the recently developed proteoform-level classification scheme for top-down tandem mass spectrometry (MS/MS) identifications in Proteoform Suite, which enables users to observe the level and type of ambiguity for each proteoform identification, including which of the ambiguous proteoform identifications are supported by bottom-up-level evidence. We used Proteoform Suite to find instances where top-down identifications aid in protein inference from bottom-up analysis and conversely where bottom-up peptide identifications aid in proteoform PTM localization. We also show the use of bottom-up data to infer proteoform candidates potentially present in the sample, allowing confirmation of such proteoform candidates by intact-mass analysis of MS1 spectra. The implementation of these capabilities in the freely available software program Proteoform Suite enables users to integrate large-scale top-down and bottom-up data sets and to utilize the synergies between them to improve and extend the proteomic analysis.
细胞功能由大量多样的蛋白质异构体执行。蛋白质异构体是由于基因变异、RNA剪接和翻译后修饰(PTM)产生的蛋白质的特定形式。对完整蛋白质进行的自上而下质谱分析能够鉴定蛋白质异构体,包括源自序列切割事件或含有多个PTM的蛋白质异构体。相比之下,自下而上的蛋白质组学鉴定的是肽段,这需要进行蛋白质推断,且无法鉴定蛋白质异构体。我们在此寻求利用这两种数据类型之间的协同作用,以提高整体蛋白质组分析的质量和深度。为此,我们在Proteoform Suite软件程序中实现了多蛋白酶自下而上和自上而下分析结果的大规模自动化整合,并将其应用于人类Jurkat T淋巴细胞系蛋白质异构体的分析。我们在Proteoform Suite中为自上而下串联质谱(MS/MS)鉴定实施了最近开发的蛋白质异构体水平分类方案,该方案使用户能够观察每个蛋白质异构体鉴定的模糊程度和类型,包括哪些模糊的蛋白质异构体鉴定得到了自下而上水平证据的支持。我们使用Proteoform Suite来寻找自上而下鉴定有助于自下而上分析进行蛋白质推断的实例,反之,自下而上的肽段鉴定有助于蛋白质异构体PTM定位的实例。我们还展示了如何使用自下而上的数据推断样品中可能存在的蛋白质异构体候选物,从而通过对MS1光谱的完整质量分析来确认此类蛋白质异构体候选物。在免费提供的Proteoform Suite软件程序中实现这些功能,使用户能够整合大规模的自上而下和自下而上数据集,并利用它们之间的协同作用来改进和扩展蛋白质组分析。