School of Information Technologies, University of Sydney, NSW 2006, Australia.
IEEE/ACM Trans Comput Biol Bioinform. 2012 Sep-Oct;9(5):1273-80. doi: 10.1109/TCBB.2012.86.
A critical component in mass spectrometry (MS)-based proteomics is an accurate protein identification procedure. Database search algorithms commonly generate a list of peptide-spectrum matches (PSMs). The validity of these PSMs is critical for downstream analysis since proteins that are present in the sample are inferred from those PSMs. A variety of postprocessing algorithms have been proposed to validate and filter PSMs. Among them, the most popular ones include a semi-supervised learning (SSL) approach known as Percolator and an empirical modeling approach known as PeptideProphet. However, they are predominantly designed for commercial database search algorithms, i.e., SEQUEST and MASCOT. Therefore, it is highly desirable to extend and optimize those PSM postprocessing algorithms for open source database search algorithms such as X!Tandem. In this paper, we propose a Self-boosted Percolator for postprocessing X!Tandem search results. We find that the SSL algorithm utilized by Percolator depends heavily on the initial ranking of PSMs. Starting with a poor PSM ranking list may cause Percolator to perform suboptimally. By implementing Percolator in a cascade learning manner, we can progressively improve the performance through multiple boost runs, enabling many more PSM identifications without sacrificing false discovery rate (FDR).
在基于质谱(MS)的蛋白质组学中,一个关键的组成部分是准确的蛋白质鉴定程序。数据库搜索算法通常会生成肽谱匹配(PSM)的列表。这些 PSM 的有效性对于下游分析至关重要,因为样品中存在的蛋白质是从这些 PSM 推断出来的。已经提出了各种后处理算法来验证和过滤 PSM。其中,最流行的方法包括称为 Percolator 的半监督学习(SSL)方法和称为 PeptideProphet 的经验建模方法。然而,它们主要是为商业数据库搜索算法,即 SEQUEST 和 MASCOT 设计的。因此,非常希望将这些 PSM 后处理算法扩展和优化为开源数据库搜索算法,如 X!Tandem。在本文中,我们提出了一种用于处理 X!Tandem 搜索结果的自增强 percolator。我们发现,percolator 所使用的 SSL 算法严重依赖于 PSM 的初始排序。从一个较差的 PSM 排序列表开始可能会导致 percolator 表现不佳。通过以级联学习的方式实现 percolator,我们可以通过多次提升运行来逐步提高性能,从而在不牺牲假发现率(FDR)的情况下识别出更多的 PSM。