Qiu Jing, Cui Xiangqin
Department of Statistics, University of Missouri, Columbia, Missouri, USA.
J Biopharm Stat. 2010 Mar;20(2):240-66. doi: 10.1080/10543400903572738.
Microarray technology is commonly used to identify differentially expressed (DE) genes across conditions. A related issue that has rarely been discussed but is equally important is to identify commonly expressed genes or constantly expressed genes across different organs, tissues, or species. A common practice in the literature for such studies is to apply the differential expression analysis and conclude that a gene is unchanged if there is no statistical evidence to conclude for differential expression. However, genes that are not statistically significantly DE could be (1) truly non-DE genes or (2) truly DE genes not detected by the statistical test of differential expression due to lack of power resulted from high noise level or lack of replication. Therefore, the practice of treating non-statistically significantly DE genes as non-DE genes has the risk of including genes that are truly DE without controlling such errors. We argue that if one wants to identify genes that are truly non-DE, one needs to show statistical evidence through valid statistical tests with the appropriate type I error rate control. In this paper, we consider the identification of non-DE genes through statistical equivalence tests under the framework of multiple testing. In particular, we consider the average equivalence criterion and study the power and false discovery rate (FDR) of the standard average equivalence test, the "two one-sided tests" (TOST), through extensive simulation studies based on real microarray data sets. We study the effects of various factors that can affect the power and FDR of the equivalence test including the proportion of non-DE genes. We also compare the ROC curves of the equivalence test with those of the naive method of selecting genes that are not statistically significant DE.
微阵列技术通常用于识别不同条件下差异表达(DE)的基因。一个很少被讨论但同样重要的相关问题是识别不同器官、组织或物种间共同表达的基因或持续表达的基因。文献中此类研究的常见做法是应用差异表达分析,并得出如果没有统计证据支持差异表达,则基因未发生变化的结论。然而,在统计学上无显著差异表达的基因可能是(1)真正无差异表达的基因,或者(2)由于高噪声水平导致缺乏检验效能或缺乏重复实验,而未被差异表达统计检验检测到的真正差异表达的基因。因此,将统计学上无显著差异表达的基因视为无差异表达基因的做法,存在纳入真正差异表达基因而未控制此类错误的风险。我们认为,如果想要识别真正无差异表达的基因,就需要通过具有适当I型错误率控制的有效统计检验来证明统计证据。在本文中,我们考虑在多重检验框架下通过统计等效性检验来识别无差异表达基因。具体而言,我们考虑平均等效性标准,并通过基于真实微阵列数据集的广泛模拟研究,研究标准平均等效性检验“双侧单侧检验”(TOST)的检验效能和错误发现率(FDR)。我们研究了各种可能影响等效性检验效能和FDR的因素的影响,包括无差异表达基因的比例。我们还将等效性检验的ROC曲线与选择统计学上无显著差异表达基因的简单方法的ROC曲线进行了比较。