蛋白质鉴定的元分析：以酵母数据为例。

Meta-analysis for protein identification: a case study on yeast data.

机构信息

Bioinformatics & High-throughput Analysis Laboratory, Seattle Children's Research Institute, Seattle, Washington 98101, USA.

出版信息

OMICS. 2010 Jun;14(3):309-14. doi: 10.1089/omi.2010.0034.

DOI:10.1089/omi.2010.0034

PMID:20569183

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3133781/

Abstract

Large amounts of mass spectrometry (MS) proteomics data are now publicly available; however, little attention has been given to how to best combine these data and assess the error rates for protein identification. The objective of this article is to show how variation in the type and amount of data included with each study impacts coverage of the yeast proteome and estimation of the false discovery rate (FDR). Our analysis of a subset of the publicly available yeast data showed that failure to reevaluate the FDR when combining protein IDs from different experiments resulted in an underestimation of the FDR by approximately threefold. A worst-case approximation of the FDR was only slightly larger than estimating the FDR by randomized database matches. The use of a weighted model to emphasize the most informative experimental data provided an increase in the number of IDs at a 1% FDR when compared to other meta-analysis approaches. Also, using an FDR higher than 1% results in a very high rate of false discoveries for IDs above the 1% threshold. Ideally, raw MS data will be made publicly available for complete and consistent reanalysis. In the circumstance that raw data is not available, determining a combined FDR on the basis of the worst-case estimation provides a reasonable approximation of the FDR. When combining experimental results, adding additional experiments results in diminishing and in some cases negative returns on protein identifications. It may be beneficial to include only those experiments generating the most unique identifications due to solid experimental design and sensitive instrumentation.

摘要

现在有大量的质谱（MS）蛋白质组学数据可供公开使用；然而，对于如何最好地结合这些数据并评估蛋白质鉴定的错误率，人们关注甚少。本文的目的是展示每个研究中包含的数据类型和数量的变化如何影响酵母蛋白质组的覆盖率以及假发现率（FDR）的估计。我们对公开可用的酵母数据的一个子集进行了分析，结果表明，如果在组合来自不同实验的蛋白质 ID 时未能重新评估 FDR，将会导致 FDR 的低估约三倍。FDR 的最坏情况近似值仅略大于通过随机数据库匹配来估计 FDR。与其他元分析方法相比，使用加权模型来强调最有信息量的实验数据，在 1% FDR 时可以增加 ID 的数量。此外，当 FDR 高于 1%时，对于高于 1%阈值的 ID，错误发现率会非常高。理想情况下，原始 MS 数据将公开提供，以便进行完整和一致的重新分析。在无法获取原始数据的情况下，基于最坏情况估计确定综合 FDR 是 FDR 的合理近似值。在组合实验结果时，添加额外的实验会导致蛋白质鉴定的回报递减，在某些情况下甚至为负。由于具有可靠的实验设计和灵敏的仪器，仅包含那些生成最多独特鉴定的实验可能会更有益。

相似文献

Meta-analysis for protein identification: a case study on yeast data.蛋白质鉴定的元分析：以酵母数据为例。

OMICS. 2010 Jun;14(3):309-14. doi: 10.1089/omi.2010.0034.

IPM: An integrated protein model for false discovery rate estimation and identification in high-throughput proteomics.IPM：一种用于高通量蛋白质组学中假发现率估计和鉴定的综合蛋白质模型。

J Proteomics. 2011 Dec 10;75(1):116-21. doi: 10.1016/j.jprot.2011.06.003. Epub 2011 Jun 21.

A new estimation of protein-level false discovery rate.一种新的蛋白质水平假发现率估计方法。

BMC Genomics. 2018 Aug 13;19(Suppl 6):567. doi: 10.1186/s12864-018-4923-3.

Transferred subgroup false discovery rate for rare post-translational modifications detected by mass spectrometry.通过质谱检测到的罕见翻译后修饰的转移亚组错误发现率。

Mol Cell Proteomics. 2014 May;13(5):1359-68. doi: 10.1074/mcp.O113.030189. Epub 2013 Nov 7.

Improving sensitivity in proteome studies by analysis of false discovery rates for multiple search engines.通过分析多个搜索引擎的错误发现率提高蛋白质组学研究的灵敏度。

Proteomics. 2009 Mar;9(5):1220-9. doi: 10.1002/pmic.200800473.

A Scalable Approach for Protein False Discovery Rate Estimation in Large Proteomic Data Sets.一种用于大规模蛋白质组学数据集中蛋白质错误发现率估计的可扩展方法。

Mol Cell Proteomics. 2015 Sep;14(9):2394-404. doi: 10.1074/mcp.M114.046995. Epub 2015 May 17.

Reanalysis of ProteomicsDB Using an Accurate, Sensitive, and Scalable False Discovery Rate Estimation Approach for Protein Groups.重新分析 ProteomicsDB 使用准确、敏感和可扩展的蛋白质组错误发现率估计方法。

Mol Cell Proteomics. 2022 Dec;21(12):100437. doi: 10.1016/j.mcpro.2022.100437. Epub 2022 Nov 1.

Improved False Discovery Rate Estimation Procedure for Shotgun Proteomics.用于鸟枪法蛋白质组学的改进型错误发现率估计程序

J Proteome Res. 2015 Aug 7;14(8):3148-61. doi: 10.1021/acs.jproteome.5b00081. Epub 2015 Jul 27.

Large-Scale Reanalysis of Publicly Available HeLa Cell Proteomics Data in the Context of the Human Proteome Project.大规模重新分析人类蛋白质组计划背景下公开可用的 HeLa 细胞蛋白质组学数据。

J Proteome Res. 2018 Dec 7;17(12):4160-4170. doi: 10.1021/acs.jproteome.8b00392. Epub 2018 Sep 17.

Overview of the HUPO Plasma Proteome Project: results from the pilot phase with 35 collaborating laboratories and multiple analytical groups, generating a core dataset of 3020 proteins and a publicly-available database.人类蛋白质组组织血浆蛋白质组计划概述：来自35个合作实验室和多个分析团队的试点阶段结果，生成了一个包含3020种蛋白质的核心数据集和一个可公开获取的数据库。

Proteomics. 2005 Aug;5(13):3226-45. doi: 10.1002/pmic.200500358.

引用本文的文献

Thorough Performance Evaluation of 213 nm Ultraviolet Photodissociation for Top-down Proteomics.213nm 紫外光光解在自上而下蛋白质组学中的全面性能评估。

Mol Cell Proteomics. 2020 Feb;19(2):405-420. doi: 10.1074/mcp.TIR119.001638. Epub 2019 Dec 30.

Multidimensional Top-Down Proteomics of Brain-Region-Specific Mouse Brain Proteoforms Responsive to Cocaine and Estradiol.脑区特异性小鼠脑蛋白构象对可卡因和雌二醇反应的多维自上而下蛋白质组学研究

J Proteome Res. 2019 Nov 1;18(11):3999-4012. doi: 10.1021/acs.jproteome.9b00481. Epub 2019 Oct 2.

A comprehensive pipeline for translational top-down proteomics from a single blood draw.从单次采血到转化性自上而下蛋白质组学的综合流程。

Nat Protoc. 2019 Jan;14(1):119-152. doi: 10.1038/s41596-018-0085-7.

Proteoforms in Peripheral Blood Mononuclear Cells as Novel Rejection Biomarkers in Liver Transplant Recipients.外周血单个核细胞中的蛋白异构体作为肝移植受者新型排斥反应生物标志物

Am J Transplant. 2017 Sep;17(9):2458-2467. doi: 10.1111/ajt.14359. Epub 2017 Jun 27.

High-Throughput Analysis of Intact Human Proteins Using UVPD and HCD on an Orbitrap Mass Spectrometer.在轨道阱质谱仪上使用紫外光解离（UVPD）和高能碰撞解离（HCD）对完整人类蛋白质进行高通量分析

J Proteome Res. 2017 May 5;16(5):2072-2079. doi: 10.1021/acs.jproteome.7b00043. Epub 2017 Apr 19.

Advancing Top-down Analysis of the Human Proteome Using a Benchtop Quadrupole-Orbitrap Mass Spectrometer.使用台式四极杆-轨道阱质谱仪推进人类蛋白质组的自上而下分析

J Proteome Res. 2017 Feb 3;16(2):609-618. doi: 10.1021/acs.jproteome.6b00698. Epub 2016 Dec 2.

Identification and Characterization of Human Proteoforms by Top-Down LC-21 Tesla FT-ICR Mass Spectrometry.通过自上而下的液相色谱-21特斯拉傅里叶变换离子回旋共振质谱法鉴定和表征人类蛋白质异构体

J Proteome Res. 2017 Feb 3;16(2):1087-1096. doi: 10.1021/acs.jproteome.6b00696. Epub 2016 Dec 12.

Comparative top down proteomics of peripheral blood mononuclear cells from kidney transplant recipients with normal kidney biopsies or acute rejection.肾活检正常或发生急性排斥反应的肾移植受者外周血单个核细胞的比较自上而下蛋白质组学

Proteomics. 2016 Jul;16(14):2048-58. doi: 10.1002/pmic.201600008.

MOPED: Model Organism Protein Expression Database.MOPED：模式生物蛋白质表达数据库。

Nucleic Acids Res. 2012 Jan;40(Database issue):D1093-9. doi: 10.1093/nar/gkr1177. Epub 2011 Dec 1.

metaXCMS: second-order analysis of untargeted metabolomics data.metaXCMS：无靶向代谢组学数据的二阶分析。

Anal Chem. 2011 Feb 1;83(3):696-700. doi: 10.1021/ac102980g. Epub 2010 Dec 21.

本文引用的文献

An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database.一种将肽的串联质谱数据与蛋白质数据库中氨基酸序列相关联的方法。

J Am Soc Mass Spectrom. 1994 Nov;5(11):976-89. doi: 10.1016/1044-0305(94)80016-2.

Estimating false discovery rates for peptide and protein identification using randomized databases.利用随机数据库估计肽和蛋白质鉴定的假发现率。

Proteomics. 2010 Jun;10(12):2369-76. doi: 10.1002/pmic.200900619.

Unbiased statistical analysis for multi-stage proteomic search strategies.多阶段蛋白质组学搜索策略的无偏统计分析。

J Proteome Res. 2010 Feb 5;9(2):700-7. doi: 10.1021/pr900256v.

A guide to the Proteomics Identifications Database proteomics data repository.蛋白质组学鉴定数据库蛋白质组学数据储存库指南。

Proteomics. 2009 Sep;9(18):4276-83. doi: 10.1002/pmic.200900402.

NCBI Peptidome: a new public repository for mass spectrometry peptide identifications.美国国立生物技术信息中心肽组库：一个用于质谱肽段鉴定的新公共数据库。

Nat Biotechnol. 2009 Jul;27(7):600-1. doi: 10.1038/nbt0709-600.

Decision tree-driven tandem mass spectrometry for shotgun proteomics.用于鸟枪法蛋白质组学的决策树驱动串联质谱法。

Nat Methods. 2008 Nov;5(11):959-64. doi: 10.1038/nmeth.1260. Epub 2008 Oct 19.

A note on the false discovery rate and inconsistent comparisons between experiments.关于错误发现率及实验间不一致比较的说明

Bioinformatics. 2008 May 15;24(10):1225-8. doi: 10.1093/bioinformatics/btn120. Epub 2008 Apr 19.

Data analysis and bioinformatics tools for tandem mass spectrometry in proteomics.蛋白质组学中串联质谱的数据分析与生物信息学工具

Physiol Genomics. 2008 Mar 14;33(1):18-25. doi: 10.1152/physiolgenomics.00298.2007. Epub 2008 Jan 22.

Experiment-specific estimation of peptide identification probabilities using a randomized database.使用随机数据库对肽段鉴定概率进行实验特异性估计。

OMICS. 2007 Winter;11(4):351-65. doi: 10.1089/omi.2007.0040.

A predictive model for identifying proteins by a single peptide match.一种通过单肽匹配来识别蛋白质的预测模型。

Bioinformatics. 2007 Feb 1;23(3):277-80. doi: 10.1093/bioinformatics/btl595. Epub 2006 Nov 22.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验