Guo Wenge, Sarkar Sanat K, Peddada Shyamal D
Biostatistics Branch, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina 27709, USA.
Biometrics. 2010 Jun;66(2):485-92. doi: 10.1111/j.1541-0420.2009.01292.x. Epub 2009 Jul 23.
Microarray gene expression studies over ordered categories are routinely conducted to gain insights into biological functions of genes and the underlying biological processes. Some common experiments are time-course/dose-response experiments where a tissue or cell line is exposed to different doses and/or durations of time to a chemical. A goal of such studies is to identify gene expression patterns/profiles over the ordered categories. This problem can be formulated as a multiple testing problem where for each gene the null hypothesis of no difference between the successive mean gene expressions is tested and further directional decisions are made if it is rejected. Much of the existing multiple testing procedures are devised for controlling the usual false discovery rate (FDR) rather than the mixed directional FDR (mdFDR), the expected proportion of Type I and directional errors among all rejections. Benjamini and Yekutieli (2005, Journal of the American Statistical Association 100, 71-93) proved that an augmentation of the usual Benjamini-Hochberg (BH) procedure can control the mdFDR while testing simple null hypotheses against two-sided alternatives in terms of one-dimensional parameters. In this article, we consider the problem of controlling the mdFDR involving multidimensional parameters. To deal with this problem, we develop a procedure extending that of Benjamini and Yekutieli based on the Bonferroni test for each gene. A proof is given for its mdFDR control when the underlying test statistics are independent across the genes. The results of a simulation study evaluating its performance under independence as well as under dependence of the underlying test statistics across the genes relative to other relevant procedures are reported. Finally, the proposed methodology is applied to a time-course microarray data obtained by Lobenhofer et al. (2002, Molecular Endocrinology 16, 1215-1229). We identified several important cell-cycle genes, such as DNA replication/repair gene MCM4 and replication factor subunit C2, which were not identified by the previous analyses of the same data by Lobenhofer et al. (2002) and Peddada et al. (2003, Bioinformatics 19, 834-841). Although some of our findings overlap with previous findings, we identify several other genes that complement the results of Lobenhofer et al. (2002).
针对有序类别进行的微阵列基因表达研究经常开展,以深入了解基因的生物学功能及潜在的生物学过程。一些常见实验是时间进程/剂量反应实验,即让组织或细胞系接触不同剂量和/或不同时长的化学物质。此类研究的一个目标是识别有序类别上的基因表达模式/特征。这个问题可被表述为一个多重检验问题,即对于每个基因,检验连续平均基因表达之间无差异的原假设,若被拒绝则进一步做出方向性决策。现有的许多多重检验程序是为控制通常的错误发现率(FDR)而设计的,而非混合方向性错误发现率(mdFDR),即在所有拒绝中I型错误和方向性错误的预期比例。本雅明尼和耶库蒂利(2005年,《美国统计协会杂志》100卷,71 - 93页)证明,在针对一维参数检验简单原假设与双侧备择假设时,对通常的本雅明尼 - 霍赫伯格(BH)程序进行扩充可控制mdFDR。在本文中,我们考虑控制涉及多维参数的mdFDR问题。为解决此问题,我们基于对每个基因的邦费罗尼检验,开发了一种扩展本雅明尼和耶库蒂利程序的方法。当基础检验统计量在各基因间相互独立时,给出了其控制mdFDR的证明。报告了一项模拟研究的结果,该研究评估了其在基础检验统计量独立以及各基因间相关情况下相对于其他相关程序的性能。最后,将所提出的方法应用于洛本霍费尔等人(2002年,《分子内分泌学》16卷,1215 - 1229页)获得的时间进程微阵列数据。我们识别出了几个重要的细胞周期基因,如DNA复制/修复基因MCM4和复制因子亚基C2,这些基因在洛本霍费尔等人(2002年)以及佩达达等人(2003年,《生物信息学》19卷,834 - 841页)对同一数据的先前分析中未被识别。尽管我们的一些发现与先前的发现有重叠,但我们还识别出了其他几个补充了洛本霍费尔等人(2002年)结果的基因。