School of Software, Dalian University of Technology, Dalian 116621, China.
Bioinformatics. 2012 Nov 15;28(22):2956-62. doi: 10.1093/bioinformatics/bts540. Epub 2012 Sep 6.
Assembling peptides identified from tandem mass spectra into a list of proteins, referred to as protein inference, is an important issue in shotgun proteomics. The objective of protein inference is to find a subset of proteins that are truly present in the sample. Although many methods have been proposed for protein inference, several issues such as peptide degeneracy still remain unsolved.
In this article, we present a linear programming model for protein inference. In this model, we use a transformation of the joint probability that each peptide/protein pair is present in the sample as the variable. Then, both the peptide probability and protein probability can be expressed as a formula in terms of the linear combination of these variables. Based on this simple fact, the protein inference problem is formulated as an optimization problem: minimize the number of proteins with non-zero probabilities under the constraint that the difference between the calculated peptide probability and the peptide probability generated from peptide identification algorithms should be less than some threshold. This model addresses the peptide degeneracy issue by forcing some joint probability variables involving degenerate peptides to be zero in a rigorous manner. The corresponding inference algorithm is named as ProteinLP. We test the performance of ProteinLP on six datasets. Experimental results show that our method is competitive with the state-of-the-art protein inference algorithms.
The source code of our algorithm is available at: https://sourceforge.net/projects/prolp/.
Supplementary data are available at Bioinformatics Online.
将串联质谱鉴定的肽组装成蛋白质列表,即蛋白质推断,是 shotgun 蛋白质组学中的一个重要问题。蛋白质推断的目的是找到真正存在于样品中的蛋白质子集。尽管已经提出了许多用于蛋白质推断的方法,但仍有几个问题,如肽的简并性,尚未得到解决。
在本文中,我们提出了一种用于蛋白质推断的线性规划模型。在这个模型中,我们使用每个肽/蛋白质对在样品中存在的联合概率的变换作为变量。然后,肽概率和蛋白质概率都可以表示为这些变量的线性组合的公式。基于这个简单的事实,蛋白质推断问题被表述为一个优化问题:在满足计算肽概率与肽鉴定算法生成的肽概率之间的差异应小于某个阈值的约束下,最小化具有非零概率的蛋白质数量。这个模型通过严格地迫使涉及简并肽的一些联合概率变量为零来解决肽的简并问题。相应的推断算法命名为 ProteinLP。我们在六个数据集上测试了 ProteinLP 的性能。实验结果表明,我们的方法与最先进的蛋白质推断算法具有竞争力。
我们的算法的源代码可在 https://sourceforge.net/projects/prolp/ 获得。
补充资料可在 Bioinformatics Online 上获得。