VIB-UGent Center for Medical Biotechnology, Ghent, Belgium.
Department of Biomolecular Medicine, Faculty of Medicine, Ghent, Belgium.
Bioinformatics. 2019 Dec 15;35(24):5243-5248. doi: 10.1093/bioinformatics/btz383.
The use of post-processing tools to maximize the information gained from a proteomics search engine is widely accepted and used by the community, with the most notable example being Percolator-a semi-supervised machine learning model which learns a new scoring function for a given dataset. The usage of such tools is however bound to the search engine's scoring scheme, which doesn't always make full use of the intensity information present in a spectrum. We aim to show how this tool can be applied in such a way that maximizes the use of spectrum intensity information by leveraging another machine learning-based tool, MS2PIP. MS2PIP predicts fragment ion peak intensities.
We show how comparing predicted intensities to annotated experimental spectra by calculating direct similarity metrics provides enough information for a tool such as Percolator to accurately separate two classes of peptide-to-spectrum matches. This approach allows using more information out of the data (compared with simpler intensity based metrics, like peak counting or explained intensities summing) while maintaining control of statistics such as the false discovery rate.
All of the code is available online at https://github.com/compomics/ms2rescore.
Supplementary data are available at Bioinformatics online.
利用后处理工具最大限度地从蛋白质组学搜索引擎中获取信息,这已被社区广泛接受和使用,其中最著名的例子是 Percolator——一种半监督机器学习模型,它为给定数据集学习新的评分函数。然而,此类工具的使用受限于搜索引擎的评分方案,该方案并非总能充分利用谱中的强度信息。我们旨在展示如何通过利用另一个基于机器学习的工具 MS2PIP 来应用此工具,从而最大限度地利用谱强度信息。MS2PIP 可预测碎片离子峰强度。
我们通过计算直接相似性度量值,展示了如何通过将预测强度与注释的实验谱进行比较,为 Percolator 等工具提供足够的信息,以准确地区分两类肽与谱匹配。这种方法允许使用更多的数据信息(与简单的基于强度的指标相比,如峰计数或解释强度求和),同时保持对错误发现率等统计数据的控制。
所有代码都可在 https://github.com/compomics/ms2rescore 上在线获得。
补充数据可在 Bioinformatics 在线获得。