Suppr超能文献

多重检验中错误发现率的估计:应用于基因微阵列数据。

Estimation of false discovery rates in multiple testing: application to gene microarray data.

作者信息

Tsai Chen-An, Hsueh Huey-miin, Chen James J

机构信息

Division of Biometry and Risk Assessment, National Center for Toxicological Research, Food and Drug Administration, Jefferson, Arkansas, USA.

出版信息

Biometrics. 2003 Dec;59(4):1071-81. doi: 10.1111/j.0006-341x.2003.00123.x.

Abstract

Testing for significance with gene expression data from DNA microarray experiments involves simultaneous comparisons of hundreds or thousands of genes. If R denotes the number of rejections (declared significant genes) and V denotes the number of false rejections, then V/R, if R > 0, is the proportion of false rejected hypotheses. This paper proposes a model for the distribution of the number of rejections and the conditional distribution of V given R, V / R. Under the independence assumption, the distribution of R is a convolution of two binomials and the distribution of V / R has a noncentral hypergeometric distribution. Under an equicorrelated model, the distributions are more complex and are also derived. Five false discovery rate probability error measures are considered: FDR = E(V/R), pFDR = E(V/R / R > 0) (positive FDR), cFDR = E(V/R / R = r) (conditional FDR), mFDR = E(V)/E(R) (marginal FDR), and eFDR = E(V)/r (empirical FDR). The pFDR, cFDR, and mFDR are shown to be equivalent under the Bayesian framework, in which the number of true null hypotheses is modeled as a random variable. We present a parametric and a bootstrap procedure to estimate the FDRs. Monte Carlo simulations were conducted to evaluate the performance of these two methods. The bootstrap procedure appears to perform reasonably well, even when the alternative hypotheses are correlated (rho = .25). An example from a toxicogenomic microarray experiment is presented for illustration.

摘要

对来自DNA微阵列实验的基因表达数据进行显著性检验涉及同时比较数百个或数千个基因。如果R表示拒绝的数量(宣布为显著的基因),V表示错误拒绝的数量,那么当R>0时,V/R就是错误拒绝假设的比例。本文提出了一个关于拒绝数量分布以及给定R时V的条件分布V/R的模型。在独立性假设下,R的分布是两个二项分布的卷积,V/R的分布具有非中心超几何分布。在等相关模型下,分布更为复杂,也已推导得出。考虑了五种错误发现率概率误差度量:FDR = E(V/R),pFDR = E(V/R / R > 0)(正FDR),cFDR = E(V/R / R = r)(条件FDR),mFDR = E(V)/E(R)(边际FDR),以及eFDR = E(V)/r(经验FDR)。在贝叶斯框架下,pFDR、cFDR和mFDR被证明是等价的,其中真零假设的数量被建模为一个随机变量。我们提出了一种参数化方法和一种自助法来估计错误发现率。进行了蒙特卡罗模拟以评估这两种方法的性能。即使在备择假设相关(rho = 0.25)的情况下,自助法似乎也表现得相当不错。给出了一个来自毒理基因组微阵列实验的例子进行说明。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验