Department of Communications Engineering, University of the Basque Country (UPV/EHU), 48013 Bilbao, Spain.
Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), 28049 Madrid, Spain.
J Proteome Res. 2020 Mar 6;19(3):1285-1297. doi: 10.1021/acs.jproteome.9b00819. Epub 2020 Feb 25.
Shotgun proteomics is the method of choice for high-throughput protein identification; however, robust statistical methods are essential to automatize this task while minimizing the number of false identifications. The standard method for estimating the false discovery rate (FDR) of individual identifications and keeping it below a threshold (typically 1%) is the target-decoy approach. However, numerous works have shown that FDR at the protein level may become much larger than FDR at the peptide level. The development of an appropriate scoring model to identify proteins from their peptides using high-throughput shotgun proteomics is highly needed. In this study, we present a novel protein-level scoring algorithm that uses the scores of the identified peptides and maintains all of the properties expected for a true protein probability. We also present a refinement of the method to calculate FDR at the protein level. These algorithms can be used together as a robust identification workflow suitable for large-scale proteomics, and we show that the identification performance of this workflow is superior to that of other widely used methods in several samples and using different search engines. Our protein probability model offers the scientific community an algorithm that is easy to integrate into protein identification workflows for the automated analysis of shotgun proteomics data.
shotgun 蛋白质组学是高通量蛋白质鉴定的首选方法;然而,为了在最小化假阳性鉴定数量的同时实现自动化,稳健的统计方法至关重要。估计单个鉴定的假发现率 (FDR) 并将其保持在阈值以下(通常为 1%)的标准方法是靶标-诱饵方法。然而,许多研究表明,蛋白质水平的 FDR 可能比肽水平的 FDR 大得多。需要开发一种合适的评分模型,以便使用高通量 shotgun 蛋白质组学从其肽中识别蛋白质。在这项研究中,我们提出了一种新的蛋白质水平评分算法,该算法使用鉴定肽的分数,并保持所有真正蛋白质概率的预期特性。我们还提出了一种改进的方法来计算蛋白质水平的 FDR。这些算法可以一起用作适合大规模蛋白质组学的稳健鉴定工作流程,我们表明该工作流程的鉴定性能在几个样本和使用不同的搜索引擎时优于其他广泛使用的方法。我们的蛋白质概率模型为科学界提供了一种算法,该算法易于集成到蛋白质鉴定工作流程中,用于 shotgun 蛋白质组学数据的自动分析。