Bickel David R
Medical College of Georgia, Office of Biostatistics and Bioinformatics, Augusta, GA 30912-4900, USA.
Bioinformatics. 2004 Mar 22;20(5):682-8. doi: 10.1093/bioinformatics/btg468. Epub 2004 Jan 22.
Many methods of identifying differential expression in genes depend on testing the null hypotheses of exactly equal means or distributions of expression levels for each gene across groups, even though a statistically significant difference in the expression level does not imply the occurrence of any difference of biological or clinical significance. This is because a mathematical definition of 'differential expression' as any non-zero difference does not correspond to the differential expression biologists seek. Furthermore, while some current methods account for multiple comparisons in hypothesis tests, they do not accordingly adjust estimates of the degrees to which genes are differentially expressed. Both problems lead to overstating the relevance of findings.
Testing whether genes have relevant differential expression can be accomplished with customized null hypotheses, thereby redefining 'differential expression' in a way that is more biologically meaningful. When such tests control the false discovery rate, they effectively discover genes based on a desired quantile of differential gene expression. Estimation of the degree to which genes are differentially expressed has been corrected for multiple comparisons.
R code is freely available from http://www.davidbickel.com and may become available from www.r-project.org or www.bioconductor.org
Applications to cancer microarrays, an application in the absence of differential expression, pseudocode, and a guide to customizing the methods may be found at www.davidbickel.com and www.mathpreprints.com
许多识别基因差异表达的方法依赖于检验每个基因在不同组间表达水平均值或分布完全相等的零假设,尽管表达水平上具有统计学显著差异并不意味着存在任何生物学或临床意义上的差异。这是因为将“差异表达”数学定义为任何非零差异并不符合生物学家所寻求的差异表达。此外,虽然当前一些方法在假设检验中考虑了多重比较,但它们并未相应地调整基因差异表达程度的估计值。这两个问题都导致夸大了研究结果的相关性。
通过定制零假设来检验基因是否具有相关差异表达是可行的,从而以一种更具生物学意义的方式重新定义“差异表达”。当此类检验控制错误发现率时,它们能够基于所需的差异基因表达分位数有效地发现基因。已针对多重比较对基因差异表达程度的估计进行了校正。
R代码可从http://www.davidbickel.com免费获取,也可能在www.r-project.org或www.bioconductor.org上获取。
癌症微阵列应用、无差异表达情况下的应用、伪代码以及方法定制指南可在www.davidbickel.com和www.mathpreprints.com上找到。