Shadforth Ian, Crowther Daniel, Bessant Conrad
Cranfield Centre for Bioinformatics and IT, Cranfield University, Silsoe, UK.
Proteomics. 2005 Nov;5(16):4082-95. doi: 10.1002/pmic.200402091.
Current proteomics experiments can generate vast quantities of data very quickly, but this has not been matched by data analysis capabilities. Although there have been a number of recent reviews covering various aspects of peptide and protein identification methods using MS, comparisons of which methods are either the most appropriate for, or the most effective at, their proposed tasks are not readily available. As the need for high-throughput, automated peptide and protein identification systems increases, the creators of such pipelines need to be able to choose algorithms that are going to perform well both in terms of accuracy and computational efficiency. This article therefore provides a review of the currently available core algorithms for PMF, database searching using MS/MS, sequence tag searches and de novo sequencing. We also assess the relative performances of a number of these algorithms. As there is limited reporting of such information in the literature, we conclude that there is a need for the adoption of a system of standardised reporting on the performance of new peptide and protein identification algorithms, based upon freely available datasets. We go on to present our initial suggestions for the format and content of these datasets.
当前的蛋白质组学实验能够非常快速地生成大量数据,但数据分析能力却未能与之匹配。尽管最近有一些综述涵盖了使用质谱进行肽和蛋白质鉴定方法的各个方面,但对于哪些方法最适合或最有效地完成其既定任务的比较却并不容易获得。随着对高通量、自动化肽和蛋白质鉴定系统的需求增加,此类流程的创建者需要能够选择在准确性和计算效率方面都表现良好的算法。因此,本文对目前可用的用于肽质量指纹图谱(PMF)、使用串联质谱(MS/MS)进行数据库搜索、序列标签搜索和从头测序的核心算法进行了综述。我们还评估了其中一些算法的相对性能。由于文献中关于此类信息的报道有限,我们得出结论,需要采用一种基于免费可用数据集的新肽和蛋白质鉴定算法性能标准化报告系统。我们接着提出了关于这些数据集格式和内容的初步建议。