Research Centre for Genes, Environment and Human Health, College of Public Health, National Taiwan University, Taipei, Taiwan ROC.
Int J Epidemiol. 2010 Dec;39(6):1597-604. doi: 10.1093/ije/dyq093. Epub 2010 Jun 2.
Microarray technology provides information about hundreds and thousands of gene-expression data in a single experiment. To search for disease-related genes, researchers test for those genes that are differentially expressed between the case subjects and the control subjects.
The authors propose a new test, the 'half Student's t-test', specifically for detecting differentially expressed genes in heterogeneous diseases. Monte-Carlo simulation shows that the test maintains the nominal α level quite well for both normal and non-normal distributions. Power of the half Student's t is higher than that of the conventional 'pooled' Student's t when there is heterogeneity in the disease under study. The power gain by using the half Student's t can reach ∼10% when the standard deviation of the case group is 50% larger than that of the control group.
Application to a colon cancer data reveals that when the false discovery rate (FDR) is controlled at 0.05, the half Student's t can detect 344 differentially expressed genes, whereas the pooled Student's t can detect only 65 genes. Or alternatively, if only 50 genes are to be selected, the FDR for the pooled Student's t has to be set at 0.0320 (false positive rate of ∼3%), but for the half Student's t, it can be at as low as 0.0001 (false positive rate of about one per ten thousands).
The half Student's t-test is to be recommended for the detection of differentially expressed genes in heterogeneous diseases.
微阵列技术在单次实验中提供了数百甚至数千个基因表达数据的信息。为了寻找与疾病相关的基因,研究人员测试那些在病例组和对照组之间表达差异的基因。
作者提出了一种新的测试方法,即“半学生 t 检验”,专门用于检测异质性疾病中的差异表达基因。蒙特卡罗模拟表明,该检验在正态和非正态分布下都能很好地保持名义α水平。当研究中的疾病存在异质性时,半学生 t 的功效高于传统的“合并”学生 t。当病例组的标准差比对照组大 50%时,使用半学生 t 的功效增益可达约 10%。
应用于结肠癌数据表明,当控制错误发现率(FDR)为 0.05 时,半学生 t 可以检测到 344 个差异表达基因,而合并学生 t 只能检测到 65 个基因。或者,如果只选择 50 个基因,合并学生 t 的 FDR 必须设置为 0.0320(假阳性率约为 3%),但对半学生 t 而言,它可以低至 0.0001(假阳性率约为万分之一)。
建议在异质性疾病中使用半学生 t 检验来检测差异表达基因。