School of Software, Dalian University of Technology, China.
Comput Biol Chem. 2013 Apr;43:46-54. doi: 10.1016/j.compbiolchem.2012.12.008. Epub 2013 Jan 12.
Protein inference is an important issue in proteomics research. Its main objective is to select a proper subset of candidate proteins that best explain the observed peptides. Although many methods have been proposed for solving this problem, several issues such as peptide degeneracy and one-hit wonders still remain unsolved. Therefore, the accurate identification of proteins that are truly present in the sample continues to be a challenging task. Based on the concept of peptide detectability, we formulate the protein inference problem as a constrained Lasso regression problem, which can be solved very efficiently through a coordinate descent procedure. The new inference algorithm is named as ProteinLasso, which explores an ensemble learning strategy to address the sparsity parameter selection problem in Lasso model. We test the performance of ProteinLasso on three datasets. As shown in the experimental results, ProteinLasso outperforms those state-of-the-art protein inference algorithms in terms of both identification accuracy and running efficiency. In addition, we show that ProteinLasso is stable under different parameter specifications. The source code of our algorithm is available at: http://sourceforge.net/projects/proteinlasso.
蛋白质推断是蛋白质组学研究中的一个重要问题。其主要目的是选择一个合适的候选蛋白质子集,以最好地解释观察到的肽。尽管已经提出了许多方法来解决这个问题,但肽简并性和一次性奇迹等几个问题仍然没有得到解决。因此,准确识别真正存在于样品中的蛋白质仍然是一项具有挑战性的任务。基于肽可检测性的概念,我们将蛋白质推断问题表述为一个受约束的套索回归问题,可以通过坐标下降过程非常有效地解决。新的推断算法命名为 ProteinLasso,它探索了一种集成学习策略来解决套索模型中稀疏参数选择问题。我们在三个数据集上测试了 ProteinLasso 的性能。实验结果表明,ProteinLasso 在识别准确性和运行效率方面均优于那些最先进的蛋白质推断算法。此外,我们表明 ProteinLasso 在不同的参数规范下是稳定的。我们的算法的源代码可在:http://sourceforge.net/projects/proteinlasso.