Chair of Proteomics and Bioanalytics, Technical University of Munich (TUM), Freising, Germany.
Chair of Proteomics and Bioanalytics, Technical University of Munich (TUM), Freising, Germany.
Mol Cell Proteomics. 2022 Dec;21(12):100437. doi: 10.1016/j.mcpro.2022.100437. Epub 2022 Nov 1.
Estimating false discovery rates (FDRs) of protein identification continues to be an important topic in mass spectrometry-based proteomics, particularly when analyzing very large datasets. One performant method for this purpose is the Picked Protein FDR approach which is based on a target-decoy competition strategy on the protein level that ensures that FDRs scale to large datasets. Here, we present an extension to this method that can also deal with protein groups, that is, proteins that share common peptides such as protein isoforms of the same gene. To obtain well-calibrated FDR estimates that preserve protein identification sensitivity, we introduce two novel ideas. First, the picked group target-decoy and second, the rescued subset grouping strategies. Using entrapment searches and simulated data for validation, we demonstrate that the new Picked Protein Group FDR method produces accurate protein group-level FDR estimates regardless of the size of the data set. The validation analysis also uncovered that applying the commonly used Occam's razor principle leads to anticonservative FDR estimates for large datasets. This is not the case for the Picked Protein Group FDR method. Reanalysis of deep proteomes of 29 human tissues showed that the new method identified up to 4% more protein groups than MaxQuant. Applying the method to the reanalysis of the entire human section of ProteomicsDB led to the identification of 18,000 protein groups at 1% protein group-level FDR. The analysis also showed that about 1250 genes were represented by ≥2 identified protein groups. To make the method accessible to the proteomics community, we provide a software tool including a graphical user interface that enables merging results from multiple MaxQuant searches into a single list of identified and quantified protein groups.
估算蛋白质鉴定的假发现率(FDR)仍然是基于质谱的蛋白质组学中的一个重要课题,特别是在分析非常大的数据集时。为此目的,一种性能良好的方法是 Picked Protein FDR 方法,该方法基于蛋白质水平上的目标 - 诱饵竞争策略,确保 FDR 扩展到大数据集。在这里,我们提出了一种扩展方法,也可以处理蛋白质组,即共享共同肽的蛋白质,例如同一基因的蛋白质同工型。为了获得良好校准的 FDR 估计值,同时保留蛋白质鉴定的灵敏度,我们引入了两个新的想法。首先,选择的蛋白质组目标 - 诱饵,其次,挽救的子集分组策略。使用陷阱搜索和模拟数据进行验证,我们证明新的 Picked Protein Group FDR 方法可以产生准确的蛋白质组级 FDR 估计值,而与数据集的大小无关。验证分析还揭示了,应用常用的奥卡姆剃刀原则会导致大型数据集的 FDR 估计值保守。对于 Picked Protein Group FDR 方法则不是这样。对 29 个人类组织的深度蛋白质组的重新分析表明,该新方法鉴定的蛋白质组比 MaxQuant 多 4%。将该方法应用于 ProteomicsDB 的整个人类部分的重新分析导致在 1%蛋白质组 FDR 下鉴定出 18000 个蛋白质组。分析还表明,约有 1250 个基因由≥2 个鉴定的蛋白质组表示。为了使该方法易于为蛋白质组学社区所接受,我们提供了一个软件工具,包括一个图形用户界面,该界面允许将来自多个 MaxQuant 搜索的结果合并到一个单一的已鉴定和定量蛋白质组列表中。