Suppr超能文献

在使用差异凝胶电泳的蛋白质组学研究中避免错误结论的实验和统计学考量

Experimental and statistical considerations to avoid false conclusions in proteomics studies using differential in-gel electrophoresis.

作者信息

Karp Natasha A, McCormick Paul S, Russell Matthew R, Lilley Kathryn S

机构信息

Department of Biochemistry, University of Cambridge, Building O, Downing Site, Cambridge CB2 1QW, United Kingdom.

出版信息

Mol Cell Proteomics. 2007 Aug;6(8):1354-64. doi: 10.1074/mcp.M600274-MCP200. Epub 2007 May 17.

Abstract

In quantitative proteomics, the false discovery rate (FDR) can be defined as the number of false positives within statistically significant changes in expression. False positives accumulate during the simultaneous testing of expression changes across hundreds or thousands of protein or peptide species when univariate tests such as the Student's t test are used. Currently most researchers rely solely on the estimation of p values and a significance threshold, but this approach may result in false positives because it does not account for the multiple testing effect. For each species, a measure of significance in terms of the FDR can be calculated, producing individual q values. The q value maintains power by allowing the investigator to achieve an acceptable level of true or false positives within the calls of significance. The q value approach relies on the use of the correct statistical test for the experimental design. In this situation, a uniform p value frequency distribution when there are no differences in expression between two samples should be obtained. Here we report a bias in p value distribution in the case of a three-dye DIGE experiment where no changes in expression are occurring. The bias was shown to arise from correlation in the data from the use of a common internal standard. With a two-dye schema, where each sample has its own internal standard, such bias was removed, enabling the application of the q value to two different proteomics studies. In the case of the first study, we demonstrate that 80% of calls of significance by the more traditional method are false positives. In the second, we show that calculating the q value gives the user control over the FDR. These studies demonstrate the power and ease of use of the q value in correcting for multiple testing. This work also highlights the need for robust experimental design that includes the appropriate application of statistical procedures.

摘要

在定量蛋白质组学中,错误发现率(FDR)可定义为在具有统计学意义的表达变化中的假阳性数量。当使用诸如学生t检验等单变量检验同时检测数百或数千种蛋白质或肽类的表达变化时,假阳性会不断累积。目前,大多数研究人员仅依赖p值估计和显著性阈值,但这种方法可能会导致假阳性,因为它没有考虑多重检验效应。对于每个物种,可以计算出一个基于FDR的显著性度量,从而得出各个q值。q值通过允许研究者在显著性判定中达到可接受的真阳性或假阳性水平来保持检验效能。q值方法依赖于针对实验设计使用正确的统计检验。在这种情况下,当两个样本之间的表达没有差异时,应获得均匀的p值频率分布。在此,我们报告了在三色荧光差异凝胶电泳(DIGE)实验中,即使没有表达变化,p值分布也存在偏差。结果表明,这种偏差源于使用共同内标所导致的数据相关性。采用双色模式时,每个样本都有自己的内标,这种偏差得以消除,从而能够将q值应用于两项不同的蛋白质组学研究。在第一项研究中,我们证明,采用更传统方法判定的显著性结果中,80%是假阳性。在第二项研究中,我们表明计算q值可让用户控制FDR。这些研究证明了q值在校正多重检验方面的效能和易用性。这项工作还凸显了稳健实验设计的必要性,其中包括适当应用统计程序。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验