一种新的蛋白质水平假发现率估计方法。

A new estimation of protein-level false discovery rate.

机构信息

The Dental Center of China-Japan Friendship Hospital, Beijing, China.

ShenZhen Research Institute of Big Data, ShenZhen, China.

出版信息

BMC Genomics. 2018 Aug 13;19(Suppl 6):567. doi: 10.1186/s12864-018-4923-3.

DOI:10.1186/s12864-018-4923-3

PMID:30367581

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6101079/

Abstract

BACKGROUND

In mass spectrometry-based proteomics, protein identification is an essential task. Evaluating the statistical significance of the protein identification result is critical to the success of proteomics studies. Controlling the false discovery rate (FDR) is the most common method for assuring the overall quality of the set of identifications. Existing FDR estimation methods either rely on specific assumptions or rely on the two-stage calculation process of first estimating the error rates at the peptide-level, and then combining them somehow at the protein-level. We propose to estimate the FDR in a non-parametric way with less assumptions and to avoid the two-stage calculation process.

RESULTS

We propose a new protein-level FDR estimation framework. The framework contains two major components: the Permutation+BH (Benjamini-Hochberg) FDR estimation method and the logistic regression-based null inference method. In Permutation+BH, the null distribution of a sample is generated by searching data against a large number of permuted random protein database and therefore does not rely on specific assumptions. Then, p-values of proteins are calculated from the null distribution and the BH procedure is applied to the p-values to achieve the relationship of the FDR and the number of protein identifications. The Permutation+BH method generates the null distribution by the permutation method, which is inefficient for online identification. The logistic regression model is proposed to infer the null distribution of a new sample based on existing null distributions obtained from the Permutation+BH method.

CONCLUSIONS

In our experiment based on three public available datasets, our Permutation+BH method achieves consistently better performance than MAYU, which is chosen as the benchmark FDR calculation method for this study. The null distribution inference result shows that the logistic regression model achieves a reasonable result both in the shape of the null distribution and the corresponding FDR estimation result.

摘要

背景

在基于质谱的蛋白质组学中，蛋白质鉴定是一项必不可少的任务。评估蛋白质鉴定结果的统计显著性对于蛋白质组学研究的成功至关重要。控制假发现率（FDR）是确保鉴定集整体质量的最常用方法。现有的 FDR 估计方法要么依赖于特定的假设，要么依赖于首先估计肽级别的错误率，然后以某种方式在蛋白质级别组合它们的两阶段计算过程。我们建议以较少的假设和避免两阶段计算过程的非参数方式估计 FDR。

结果

我们提出了一种新的蛋白质水平 FDR 估计框架。该框架包含两个主要组件：置换+BH（Benjamini-Hochberg）FDR 估计方法和基于逻辑回归的无效推断方法。在置换+BH 中，通过对大量置换随机蛋白质数据库进行搜索来生成样本的零分布，因此不依赖于特定的假设。然后，从零分布计算蛋白质的 p 值，并应用 BH 过程对 p 值进行处理，以获得 FDR 和蛋白质鉴定数量之间的关系。置换+BH 方法通过置换方法生成零分布，对于在线鉴定效率不高。我们提出了逻辑回归模型，基于置换+BH 方法获得的现有零分布来推断新样本的零分布。

结论

在我们基于三个公共可用数据集的实验中，我们的置换+BH 方法的性能始终优于 MAYU，后者被选为本研究中 FDR 计算方法的基准。零分布推断结果表明，逻辑回归模型在零分布的形状和相应的 FDR 估计结果方面都取得了合理的结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5644/6101079/36831a57befc/12864_2018_4923_Fig1_HTML.jpg

相似文献

A new estimation of protein-level false discovery rate.

BMC Genomics. 2018 Aug 13;19(Suppl 6):567. doi: 10.1186/s12864-018-4923-3.

Decoy-free protein-level false discovery rate estimation.

Bioinformatics. 2014 Mar 1;30(5):675-81. doi: 10.1093/bioinformatics/btt431. Epub 2013 Aug 6.

IPM: An integrated protein model for false discovery rate estimation and identification in high-throughput proteomics.

J Proteomics. 2011 Dec 10;75(1):116-21. doi: 10.1016/j.jprot.2011.06.003. Epub 2011 Jun 21.

Transfer posterior error probability estimation for peptide identification.

BMC Bioinformatics. 2020 May 4;21(1):173. doi: 10.1186/s12859-020-3485-y.

False discovery rate and permutation test: an evaluation in ERP data analysis.

Stat Med. 2010 Jan 15;29(1):63-74. doi: 10.1002/sim.3784.

Common Decoy Distributions Simplify False Discovery Rate Estimation in Shotgun Proteomics.

J Proteome Res. 2022 Feb 4;21(2):339-348. doi: 10.1021/acs.jproteome.1c00600. Epub 2022 Jan 6.

Unbiased False Discovery Rate Estimation for Shotgun Proteomics Based on the Target-Decoy Approach.

J Proteome Res. 2017 Feb 3;16(2):393-397. doi: 10.1021/acs.jproteome.6b00144. Epub 2016 Dec 13.

Transferred subgroup false discovery rate for rare post-translational modifications detected by mass spectrometry.

Mol Cell Proteomics. 2014 May;13(5):1359-68. doi: 10.1074/mcp.O113.030189. Epub 2013 Nov 7.

Costs and Benefits of Popular -Value Correction Methods in Three Models of Quantitative Omic Experiments.

Anal Chem. 2023 Feb 7;95(5):2732-2740. doi: 10.1021/acs.analchem.2c03719. Epub 2023 Jan 24.

False Discovery Rate Estimation for Hybrid Mass Spectral Library Search Identifications in Bottom-up Proteomics.

J Proteome Res. 2019 Sep 6;18(9):3223-3234. doi: 10.1021/acs.jproteome.8b00863. Epub 2019 Aug 14.

引用本文的文献

Nilotinib treatment outcomes in autosomal dominant spinocerebellar ataxia over one year.

Sci Rep. 2024 Jul 15;14(1):16303. doi: 10.1038/s41598-024-67072-z.

An analysis of lncRNA-miRNA-mRNA networks to investigate the effects of HDAC4 inhibition on skeletal muscle atrophy caused by peripheral nerve injury.

Ann Transl Med. 2022 May;10(9):516. doi: 10.21037/atm-21-6512.

High-Density Lipoprotein Cholesterol and Apolipoprotein A1 in Synovial Fluid: Potential Predictors of Disease Severity of Primary Knee Osteoarthritis.

Cartilage. 2021 Dec;13(1_suppl):1465S-1473S. doi: 10.1177/19476035211007919. Epub 2021 Apr 17.

A Critical Review of Bottom-Up Proteomics: The Good, the Bad, and the Future of this Field.

Proteomes. 2020 Jul 6;8(3):14. doi: 10.3390/proteomes8030014.

Bioinformatics Methods for Mass Spectrometry-Based Proteomics Data Analysis.

Int J Mol Sci. 2020 Apr 20;21(8):2873. doi: 10.3390/ijms21082873.

本文引用的文献

An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database.

J Am Soc Mass Spectrom. 1994 Nov;5(11):976-89. doi: 10.1016/1044-0305(94)80016-2.

Target-decoy approach and false discovery rate: when things may go wrong.

J Am Soc Mass Spectrom. 2011 Jul;22(7):1111-20. doi: 10.1007/s13361-011-0139-3. Epub 2011 May 5.

Assigning spectrum-specific P-values to protein identifications by mass spectrometry.

Bioinformatics. 2011 Apr 15;27(8):1128-34. doi: 10.1093/bioinformatics/btr089. Epub 2011 Feb 23.

Protein and gene model inference based on statistical modeling in k-partite graphs.

Proc Natl Acad Sci U S A. 2010 Jul 6;107(27):12101-6. doi: 10.1073/pnas.0907654107. Epub 2010 Jun 18.

A bayesian approach to protein inference problem in shotgun proteomics.

J Comput Biol. 2009 Aug;16(8):1183-93. doi: 10.1089/cmb.2009.0018.

Mining gene functional networks to improve mass-spectrometry-based protein identification.

Bioinformatics. 2009 Nov 15;25(22):2955-61. doi: 10.1093/bioinformatics/btp461. Epub 2009 Jul 24.

Protein identification false discovery rates for very large proteomics data sets generated by tandem mass spectrometry.

Mol Cell Proteomics. 2009 Nov;8(11):2405-17. doi: 10.1074/mcp.M900317-MCP200. Epub 2009 Jul 16.

Improved ranking functions for protein and modification-site identifications.

J Comput Biol. 2008 Sep;15(7):705-19. doi: 10.1089/cmb.2007.0119.

An easy-to-use Decoy Database Builder software tool, implementing different decoy strategies for false discovery rate calculation in automated MS/MS protein identifications.

Proteomics. 2008 Mar;8(6):1129-37. doi: 10.1002/pmic.200701073.

Analysis and validation of proteomic data generated by tandem mass spectrometry.

Nat Methods. 2007 Oct;4(10):787-97. doi: 10.1038/nmeth1088.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种新的蛋白质水平假发现率估计方法。

A new estimation of protein-level false discovery rate.

机构信息

The Dental Center of China-Japan Friendship Hospital, Beijing, China.

ShenZhen Research Institute of Big Data, ShenZhen, China.