iProphet：高通量蛋白质组学数据的多层次综合分析可提高肽段和蛋白质的鉴定率和错误评估。

iProphet: multi-level integrative analysis of shotgun proteomic data improves peptide and protein identification rates and error estimates.

机构信息

Institute for Systems Biology, Seattle, WA, USA.

出版信息

Mol Cell Proteomics. 2011 Dec;10(12):M111.007690. doi: 10.1074/mcp.M111.007690. Epub 2011 Aug 29.

DOI:10.1074/mcp.M111.007690

PMID:21876204

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3237071/

Abstract

The combination of tandem mass spectrometry and sequence database searching is the method of choice for the identification of peptides and the mapping of proteomes. Over the last several years, the volume of data generated in proteomic studies has increased dramatically, which challenges the computational approaches previously developed for these data. Furthermore, a multitude of search engines have been developed that identify different, overlapping subsets of the sample peptides from a particular set of tandem mass spectrometry spectra. We present iProphet, the new addition to the widely used open-source suite of proteomic data analysis tools Trans-Proteomics Pipeline. Applied in tandem with PeptideProphet, it provides more accurate representation of the multilevel nature of shotgun proteomic data. iProphet combines the evidence from multiple identifications of the same peptide sequences across different spectra, experiments, precursor ion charge states, and modified states. It also allows accurate and effective integration of the results from multiple database search engines applied to the same data. The use of iProphet in the Trans-Proteomics Pipeline increases the number of correctly identified peptides at a constant false discovery rate as compared with both PeptideProphet and another state-of-the-art tool Percolator. As the main outcome, iProphet permits the calculation of accurate posterior probabilities and false discovery rate estimates at the level of sequence identical peptide identifications, which in turn leads to more accurate probability estimates at the protein level. Fully integrated with the Trans-Proteomics Pipeline, it supports all commonly used MS instruments, search engines, and computer platforms. The performance of iProphet is demonstrated on two publicly available data sets: data from a human whole cell lysate proteome profiling experiment representative of typical proteomic data sets, and from a set of Streptococcus pyogenes experiments more representative of organism-specific composite data sets.

摘要

串联质谱和序列数据库搜索的组合是鉴定肽和映射蛋白质组的首选方法。在过去的几年中，蛋白质组学研究中产生的数据量急剧增加，这对以前为这些数据开发的计算方法提出了挑战。此外，已经开发了许多搜索引擎，这些搜索引擎可以从特定的串联质谱谱图中识别出样品肽的不同、重叠子集。我们介绍了 iProphet，这是广泛使用的开源蛋白质组学数据分析工具 Trans-Proteomics Pipeline 的新成员。与 PeptideProphet 一起应用，它为 shotgun 蛋白质组学数据的多层次性质提供了更准确的表示。iProphet 结合了来自不同谱图、实验、前体离子电荷状态和修饰状态的相同肽序列的多个鉴定的证据。它还允许准确有效地整合应用于相同数据的多个数据库搜索引擎的结果。与 PeptideProphet 和另一种最先进的工具 Percolator 相比，iProphet 在 Trans-Proteomics Pipeline 中的使用可在恒定错误发现率下增加正确鉴定肽的数量。作为主要结果，iProphet 允许在序列相同的肽鉴定水平上计算准确的后验概率和错误发现率估计值，从而在蛋白质水平上产生更准确的概率估计值。与 Trans-Proteomics Pipeline 完全集成，它支持所有常用的 MS 仪器、搜索引擎和计算机平台。iProphet 的性能在两个公开可用的数据集上得到了证明：来自代表典型蛋白质组数据集的人类全细胞裂解物蛋白质组分析实验的数据，以及来自一组更具代表性的特定生物体复合数据集的 Streptococcus pyogenes 实验的数据。