pFind 2.0：一款通过串联质谱进行肽段和蛋白质鉴定的软件包。

pFind 2.0: a software package for peptide and protein identification via tandem mass spectrometry.

作者信息

Wang Le-Heng, Li De-Quan, Fu Yan, Wang Hai-Peng, Zhang Jing-Fen, Yuan Zuo-Fei, Sun Rui-Xiang, Zeng Rong, He Si-Min, Gao Wen

机构信息

Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, P.R. China.

出版信息

Rapid Commun Mass Spectrom. 2007;21(18):2985-91. doi: 10.1002/rcm.3173.

DOI:10.1002/rcm.3173

PMID:17702057

Abstract

This paper describes the pFind 2.0 software package for peptide and protein identification via tandem mass spectrometry. Firstly, the most important feature of pFind 2.0 is that it offers a modularized and customized platform for third parties to test and compare their algorithms. The developers can create their own modules following the open application programming interface (API) standards and then add it into workflows in place of the default modules. In addition, to accommodate different requirements, the package provides four automated workflows adopting different algorithm modules, executing processes and result reports. Based on this design, pFind 2.0 provides an automated target-decoy database search strategy: The user can just specify a certain false positive rate (FPR) and start searching. Then the system will return the protein identification results automatically filtered by such an estimated FPR. Secondly, pFind 2.0 is also of high accuracy and high speed. Many pragmatic preprocessing, peptide-scoring, validation, and protein inference algorithms have been incorporated. To speed up the searching process, a toolbox for indexing protein databases is developed for high-throughput applications and all modules are implemented under a new architecture designed for large-scale parallel and distributed searching. An experiment on a public dataset shows that pFind 2.0 can identify more peptides than SEQUEST and Mascot at the 1% FPR. It is also demonstrated that this version of pFind 2.0 has better usability and higher speed than its previous versions. The software and more detailed supplementary information can both be accessed at http://pfind.ict.ac.cn/.

摘要

本文介绍了用于通过串联质谱鉴定肽段和蛋白质的pFind 2.0软件包。首先，pFind 2.0最重要的特点是它为第三方提供了一个模块化和可定制的平台，用于测试和比较他们的算法。开发者可以按照开放应用程序编程接口（API）标准创建自己的模块，然后将其添加到工作流程中以替代默认模块。此外，为了满足不同需求，该软件包提供了四种采用不同算法模块、执行流程和结果报告的自动化工作流程。基于此设计，pFind 2.0提供了一种自动化的目标-诱饵数据库搜索策略：用户只需指定一定的假阳性率（FPR）并开始搜索。然后系统将返回通过该估计FPR自动过滤的蛋白质鉴定结果。其次，pFind 2.0还具有高精度和高速度。它纳入了许多实用的预处理、肽段评分、验证和蛋白质推断算法。为了加快搜索过程，开发了一个用于蛋白质数据库索引的工具箱以用于高通量应用，并且所有模块都在为大规模并行和分布式搜索设计的新架构下实现。在一个公共数据集上的实验表明，在1%的FPR下，pFind 2.0比SEQUEST和Mascot能鉴定出更多的肽段。还证明了这个版本的pFind 2.0比其以前的版本具有更好的可用性和更高的速度。该软件及更详细的补充信息均可在http://pfind.ict.ac.cn/获取。