Sultana Tamanna, Jordan Rick, Lyons-Weiler James
Bioinformatics Analysis Core, Genomics and Proteomics Core Laboratories and Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA.
J Proteomics Bioinform. 2009 Jun 1;2(6):262-273. doi: 10.4172/jpb.1000085.
Correct identification of peptides and proteins in complex biological samples from proteomic mass-spectra is a challenging problem in bioinformatics. The sensitivity and specificity of identification algorithms depend on underlying scoring methods, some being more sensitive, and others more specific. For high-throughput, automated peptide identification, control over the algorithms' performance in terms of trade-off between sensitivity and specificity is desirable. Combinations of algorithms, called 'consensus methods', have been shown to provide more accurate results than individual algorithms. However, due to the proliferation of algorithms and their varied internal settings, a systematic understanding of relative performance of individual and consensus methods are lacking. We performed an in-depth analysis of various approaches to consensus scoring using known protein mixtures, and evaluated the performance of 2310 settings generated from consensus of three different search algorithms: Mascot, Sequest, and X!Tandem. Our findings indicate that the union of Mascot, Sequest, and X!Tandem performed well (considering overall accuracy), and methods using 80-99.9% protein probability and/or minimum 2 peptides and/or 0-50% minimum peptide probability for protein identification performed better (on average) among all consensus methods tested in terms of overall accuracy. The results also suggest method selection strategies to provide direct control over sensitivity and specificity.
从蛋白质组质谱中准确识别复杂生物样品中的肽段和蛋白质是生物信息学中的一个具有挑战性的问题。识别算法的灵敏度和特异性取决于基础评分方法,有些方法更灵敏,而有些方法更具特异性。对于高通量、自动化的肽段识别,在灵敏度和特异性之间进行权衡时控制算法的性能是很有必要的。被称为“共识方法”的算法组合已被证明能比单个算法提供更准确的结果。然而,由于算法的激增及其各种不同的内部设置,目前缺乏对单个方法和共识方法相对性能的系统理解。我们使用已知的蛋白质混合物对各种共识评分方法进行了深入分析,并评估了由三种不同搜索算法(Mascot、Sequest和X!Tandem)的共识产生的2310种设置的性能。我们的研究结果表明,Mascot、Sequest和X!Tandem的联合表现良好(考虑总体准确性),并且在所有测试的共识方法中,使用80 - 99.9%蛋白质概率和/或至少2个肽段和/或0 - 50%最小肽段概率进行蛋白质识别的方法在总体准确性方面(平均)表现更好。结果还提出了方法选择策略,以直接控制灵敏度和特异性。