Medizinisches Proteom-Center, Medical Faculty, Ruhr-University Bochum, Bochum, Germany.
Medical Proteome Analysis, Center for Protein Diagnostics (PRODI), Ruhr-University Bochum, Bochum, Germany.
PLoS One. 2022 Oct 21;17(10):e0276401. doi: 10.1371/journal.pone.0276401. eCollection 2022.
In bottom-up proteomics, proteins are enzymatically digested into peptides before measurement with mass spectrometry. The relationship between proteins and their corresponding peptides can be represented by bipartite graphs. We conduct a comprehensive analysis of bipartite graphs using quantified peptides from measured data sets as well as theoretical peptides from an in silico digestion of the corresponding complete taxonomic protein sequence databases. The aim of this study is to characterize and structure the different types of graphs that occur and to compare them between data sets. We observed a large influence of the accepted minimum peptide length during in silico digestion. When changing from theoretical peptides to measured ones, the graph structures are subject to two opposite effects. On the one hand, the graphs based on measured peptides are on average smaller and less complex compared to graphs using theoretical peptides. On the other hand, the proportion of protein nodes without unique peptides, which are a complicated case for protein inference and quantification, is considerably larger for measured data. Additionally, the proportion of graphs containing at least one protein node without unique peptides rises when going from database to quantitative level. The fraction of shared peptides and proteins without unique peptides as well as the complexity and size of the graphs highly depends on the data set and organism. Large differences between the structures of bipartite peptide-protein graphs have been observed between database and quantitative level as well as between analyzed species. In the analyzed measured data sets, the proportion of protein nodes without unique peptides ranged from 6.4% to 55.0%. This highlights the need for novel methods that can quantify proteins without unique peptides. The knowledge about the structure of the bipartite peptide-protein graphs gained in this study will be useful for the development of such algorithms.
在自下而上的蛋白质组学中,蛋白质在使用质谱进行测量之前被酶解成肽。蛋白质与其相应肽之间的关系可以用二部图来表示。我们使用来自测量数据集的定量肽以及来自相应全分类蛋白质序列数据库的计算机消化理论肽对二部图进行全面分析。本研究的目的是描述和构建不同类型的图,并在数据集之间进行比较。我们观察到在计算机消化过程中接受的最小肽长度有很大的影响。当从理论肽变为测量肽时,图形结构受到两种相反的影响。一方面,与使用理论肽的图相比,基于测量肽的图平均更小且更简单。另一方面,对于蛋白质推断和定量来说是一个复杂情况的没有唯一肽的蛋白质节点的比例,对于测量数据来说要大得多。此外,当从数据库到定量水平时,包含至少一个没有唯一肽的蛋白质节点的图的比例会上升。无独特肽的共享肽和蛋白质的比例以及图的复杂性和大小高度依赖于数据集和生物体。在数据库和定量水平以及分析物种之间观察到二部肽-蛋白质图的结构之间存在很大差异。在所分析的测量数据集,没有独特肽的蛋白质节点的比例范围为 6.4%至 55.0%。这突出表明需要有新的方法来定量无独特肽的蛋白质。本研究中获得的关于二部肽-蛋白质图结构的知识将有助于此类算法的开发。