Department of Genome Sciences, University of Washington, Seattle, Washington, USA.
J Proteome Res. 2010 Oct 1;9(10):5346-57. doi: 10.1021/pr100594k.
The problem of identifying proteins from a shotgun proteomics experiment has not been definitively solved. Identifying the proteins in a sample requires ranking them, ideally with interpretable scores. In particular, "degenerate" peptides, which map to multiple proteins, have made such a ranking difficult to compute. The problem of computing posterior probabilities for the proteins, which can be interpreted as confidence in a protein's presence, has been especially daunting. Previous approaches have either ignored the peptide degeneracy problem completely, addressed it by computing a heuristic set of proteins or heuristic posterior probabilities, or estimated the posterior probabilities with sampling methods. We present a probabilistic model for protein identification in tandem mass spectrometry that recognizes peptide degeneracy. We then introduce graph-transforming algorithms that facilitate efficient computation of protein probabilities, even for large data sets. We evaluate our identification procedure on five different well-characterized data sets and demonstrate our ability to efficiently compute high-quality protein posteriors.
从 shotgun 蛋白质组学实验中鉴定蛋白质的问题尚未得到明确解决。鉴定样品中的蛋白质需要对其进行排序,理想情况下还需要具有可解释的分数。特别是“简并”肽,其映射到多个蛋白质上,使得这种排序难以计算。计算蛋白质后验概率的问题(可解释为对蛋白质存在的置信度)特别令人生畏。以前的方法要么完全忽略肽简并问题,要么通过计算一组启发式蛋白质或启发式后验概率来解决该问题,要么使用采样方法估计后验概率。我们提出了一种用于串联质谱中蛋白质鉴定的概率模型,该模型可识别肽简并性。然后,我们引入了图形转换算法,即使对于大型数据集,也可以方便地计算蛋白质概率。我们在五个不同的特征良好的数据集中评估了我们的鉴定程序,并证明了我们能够有效地计算高质量的蛋白质后验概率。