Suppr超能文献

蛋白质鉴定的元分析:以酵母数据为例。

Meta-analysis for protein identification: a case study on yeast data.

机构信息

Bioinformatics & High-throughput Analysis Laboratory, Seattle Children's Research Institute, Seattle, Washington 98101, USA.

出版信息

OMICS. 2010 Jun;14(3):309-14. doi: 10.1089/omi.2010.0034.

Abstract

Large amounts of mass spectrometry (MS) proteomics data are now publicly available; however, little attention has been given to how to best combine these data and assess the error rates for protein identification. The objective of this article is to show how variation in the type and amount of data included with each study impacts coverage of the yeast proteome and estimation of the false discovery rate (FDR). Our analysis of a subset of the publicly available yeast data showed that failure to reevaluate the FDR when combining protein IDs from different experiments resulted in an underestimation of the FDR by approximately threefold. A worst-case approximation of the FDR was only slightly larger than estimating the FDR by randomized database matches. The use of a weighted model to emphasize the most informative experimental data provided an increase in the number of IDs at a 1% FDR when compared to other meta-analysis approaches. Also, using an FDR higher than 1% results in a very high rate of false discoveries for IDs above the 1% threshold. Ideally, raw MS data will be made publicly available for complete and consistent reanalysis. In the circumstance that raw data is not available, determining a combined FDR on the basis of the worst-case estimation provides a reasonable approximation of the FDR. When combining experimental results, adding additional experiments results in diminishing and in some cases negative returns on protein identifications. It may be beneficial to include only those experiments generating the most unique identifications due to solid experimental design and sensitive instrumentation.

摘要

现在有大量的质谱(MS)蛋白质组学数据可供公开使用;然而,对于如何最好地结合这些数据并评估蛋白质鉴定的错误率,人们关注甚少。本文的目的是展示每个研究中包含的数据类型和数量的变化如何影响酵母蛋白质组的覆盖率以及假发现率(FDR)的估计。我们对公开可用的酵母数据的一个子集进行了分析,结果表明,如果在组合来自不同实验的蛋白质 ID 时未能重新评估 FDR,将会导致 FDR 的低估约三倍。FDR 的最坏情况近似值仅略大于通过随机数据库匹配来估计 FDR。与其他元分析方法相比,使用加权模型来强调最有信息量的实验数据,在 1% FDR 时可以增加 ID 的数量。此外,当 FDR 高于 1%时,对于高于 1%阈值的 ID,错误发现率会非常高。理想情况下,原始 MS 数据将公开提供,以便进行完整和一致的重新分析。在无法获取原始数据的情况下,基于最坏情况估计确定综合 FDR 是 FDR 的合理近似值。在组合实验结果时,添加额外的实验会导致蛋白质鉴定的回报递减,在某些情况下甚至为负。由于具有可靠的实验设计和灵敏的仪器,仅包含那些生成最多独特鉴定的实验可能会更有益。

相似文献

3
A new estimation of protein-level false discovery rate.一种新的蛋白质水平假发现率估计方法。
BMC Genomics. 2018 Aug 13;19(Suppl 6):567. doi: 10.1186/s12864-018-4923-3.
8

引用本文的文献

9
MOPED: Model Organism Protein Expression Database.MOPED:模式生物蛋白质表达数据库。
Nucleic Acids Res. 2012 Jan;40(Database issue):D1093-9. doi: 10.1093/nar/gkr1177. Epub 2011 Dec 1.
10

本文引用的文献

7
A note on the false discovery rate and inconsistent comparisons between experiments.关于错误发现率及实验间不一致比较的说明
Bioinformatics. 2008 May 15;24(10):1225-8. doi: 10.1093/bioinformatics/btn120. Epub 2008 Apr 19.
8
Data analysis and bioinformatics tools for tandem mass spectrometry in proteomics.蛋白质组学中串联质谱的数据分析与生物信息学工具
Physiol Genomics. 2008 Mar 14;33(1):18-25. doi: 10.1152/physiolgenomics.00298.2007. Epub 2008 Jan 22.
10
A predictive model for identifying proteins by a single peptide match.一种通过单肽匹配来识别蛋白质的预测模型。
Bioinformatics. 2007 Feb 1;23(3):277-80. doi: 10.1093/bioinformatics/btl595. Epub 2006 Nov 22.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验