Suppr超能文献

多重检验中错误发现率的估计:应用于基因微阵列数据。

Estimation of false discovery rates in multiple testing: application to gene microarray data.

作者信息

Tsai Chen-An, Hsueh Huey-miin, Chen James J

机构信息

Division of Biometry and Risk Assessment, National Center for Toxicological Research, Food and Drug Administration, Jefferson, Arkansas, USA.

出版信息

Biometrics. 2003 Dec;59(4):1071-81. doi: 10.1111/j.0006-341x.2003.00123.x.

Abstract

Testing for significance with gene expression data from DNA microarray experiments involves simultaneous comparisons of hundreds or thousands of genes. If R denotes the number of rejections (declared significant genes) and V denotes the number of false rejections, then V/R, if R > 0, is the proportion of false rejected hypotheses. This paper proposes a model for the distribution of the number of rejections and the conditional distribution of V given R, V / R. Under the independence assumption, the distribution of R is a convolution of two binomials and the distribution of V / R has a noncentral hypergeometric distribution. Under an equicorrelated model, the distributions are more complex and are also derived. Five false discovery rate probability error measures are considered: FDR = E(V/R), pFDR = E(V/R / R > 0) (positive FDR), cFDR = E(V/R / R = r) (conditional FDR), mFDR = E(V)/E(R) (marginal FDR), and eFDR = E(V)/r (empirical FDR). The pFDR, cFDR, and mFDR are shown to be equivalent under the Bayesian framework, in which the number of true null hypotheses is modeled as a random variable. We present a parametric and a bootstrap procedure to estimate the FDRs. Monte Carlo simulations were conducted to evaluate the performance of these two methods. The bootstrap procedure appears to perform reasonably well, even when the alternative hypotheses are correlated (rho = .25). An example from a toxicogenomic microarray experiment is presented for illustration.

摘要

对来自DNA微阵列实验的基因表达数据进行显著性检验涉及同时比较数百个或数千个基因。如果R表示拒绝的数量(宣布为显著的基因),V表示错误拒绝的数量,那么当R>0时,V/R就是错误拒绝假设的比例。本文提出了一个关于拒绝数量分布以及给定R时V的条件分布V/R的模型。在独立性假设下,R的分布是两个二项分布的卷积,V/R的分布具有非中心超几何分布。在等相关模型下,分布更为复杂,也已推导得出。考虑了五种错误发现率概率误差度量:FDR = E(V/R),pFDR = E(V/R / R > 0)(正FDR),cFDR = E(V/R / R = r)(条件FDR),mFDR = E(V)/E(R)(边际FDR),以及eFDR = E(V)/r(经验FDR)。在贝叶斯框架下,pFDR、cFDR和mFDR被证明是等价的,其中真零假设的数量被建模为一个随机变量。我们提出了一种参数化方法和一种自助法来估计错误发现率。进行了蒙特卡罗模拟以评估这两种方法的性能。即使在备择假设相关(rho = 0.25)的情况下,自助法似乎也表现得相当不错。给出了一个来自毒理基因组微阵列实验的例子进行说明。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验