基于半监督模型的质谱蛋白质组学中肽段鉴定的验证

Semisupervised model-based validation of peptide identifications in mass spectrometry-based proteomics.

作者信息

Choi Hyungwon, Nesvizhskii Alexey I

机构信息

Department of Pathology and Biostatistics, University of Michigan, Ann Arbor, Michigan 48109, USA.

出版信息

J Proteome Res. 2008 Jan;7(1):254-65. doi: 10.1021/pr070542g. Epub 2007 Dec 27.

DOI:10.1021/pr070542g

PMID:18159924

Abstract

Development of robust statistical methods for validation of peptide assignments to tandem mass (MS/MS) spectra obtained using database searching remains an important problem. PeptideProphet is one of the commonly used computational tools available for that purpose. An alternative simple approach for validation of peptide assignments is based on addition of decoy (reversed, randomized, or shuffled) sequences to the searched protein sequence database. The probabilistic modeling approach of PeptideProphet and the decoy strategy can be combined within a single semisupervised framework, leading to improved robustness and higher accuracy of computed probabilities even in the case of most challenging data sets. We present a semisupervised expectation-maximization (EM) algorithm for constructing a Bayes classifier for peptide identification using the probability mixture model, extending PeptideProphet to incorporate decoy peptide matches. Using several data sets of varying complexity, from control protein mixtures to a human plasma sample, and using three commonly used database search programs, SEQUEST, MASCOT, and TANDEM/k-score, we illustrate that more accurate mixture estimation leads to an improved control of the false discovery rate in the classification of peptide assignments.

摘要

开发用于验证通过数据库搜索获得的串联质谱（MS/MS）谱图中肽段匹配的稳健统计方法仍然是一个重要问题。PeptideProphet是用于此目的的常用计算工具之一。一种用于验证肽段匹配的替代简单方法是基于向搜索的蛋白质序列数据库中添加诱饵（反向、随机或重排）序列。PeptideProphet的概率建模方法和诱饵策略可以在单个半监督框架内结合，即使在最具挑战性的数据集情况下，也能提高稳健性并提高计算概率的准确性。我们提出了一种半监督期望最大化（EM）算法，用于使用概率混合模型构建用于肽段鉴定的贝叶斯分类器，扩展PeptideProphet以纳入诱饵肽段匹配。使用从对照蛋白质混合物到人类血浆样本等几个不同复杂程度的数据集，并使用三个常用的数据库搜索程序SEQUEST、MASCOT和TANDEM/k-score，我们表明更准确的混合估计会导致在肽段匹配分类中对错误发现率的更好控制。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于半监督模型的质谱蛋白质组学中肽段鉴定的验证

Semisupervised model-based validation of peptide identifications in mass spectrometry-based proteomics.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

基于半监督模型的质谱蛋白质组学中肽段鉴定的验证

Semisupervised model-based validation of peptide identifications in mass spectrometry-based proteomics.

作者信息

机构信息

出版信息

相似文献

引用本文的文献