Munneke B, Schlauch K A, Simonsen K L, Beavis W D, Doerge R W
Department of Statistics, Purdue University, West Lafayette, Indiana 47907, USA.
Genetics. 2005 Aug;170(4):2003-11. doi: 10.1534/genetics.104.031500. Epub 2005 Jun 8.
It has been well established that gene expression data contain large amounts of random variation that affects both the analysis and the results of microarray experiments. Typically, microarray data are either tested for differential expression between conditions or grouped on the basis of profiles that are assessed temporally or across genetic or environmental conditions. While testing differential expression relies on levels of certainty to evaluate the relative worth of various analyses, cluster analysis is exploratory in nature and has not had the benefit of any judgment of statistical inference. By using a novel dissimilarity function to ascertain gene expression clusters and conditional randomization of the data space to illuminate distinctions between statistically significant clusters of gene expression patterns, we aim to provide a level of confidence to inferred clusters of gene expression data. We apply both permutation and convex hull approaches for randomization of the data space and show that both methods can provide an effective assessment of gene expression profiles whose coregulation is statistically different from that expected by random chance alone.
基因表达数据包含大量影响微阵列实验分析和结果的随机变异,这一点已经得到充分证实。通常,微阵列数据要么用于检测不同条件之间的差异表达,要么根据在时间上、跨遗传或环境条件评估的谱进行分组。虽然检测差异表达依赖于确定性水平来评估各种分析的相对价值,但聚类分析本质上是探索性的,尚未受益于任何统计推断判断。通过使用一种新颖的差异函数来确定基因表达簇,并对数据空间进行条件随机化以阐明基因表达模式的统计学显著簇之间的差异,我们旨在为推断的基因表达数据簇提供一定程度的置信度。我们应用排列和凸包方法对数据空间进行随机化,并表明这两种方法都可以对共调控在统计学上不同于仅由随机机会预期的基因表达谱进行有效评估。