Department of Chemical Engineering and Delaware Biotechnology Institute, University of Delaware, Newark, DE 19716, USA.
J Theor Biol. 2010 May 21;264(2):211-22. doi: 10.1016/j.jtbi.2010.02.021. Epub 2010 Feb 17.
Gene expression studies generate large quantities of data with the defining characteristic that the number of genes (whose expression profiles are to be determined) exceed the number of available replicates by several orders of magnitude. Standard spot-by-spot analysis still seeks to extract useful information for each gene on the basis of the number of available replicates, and thus plays to the weakness of microarrays. On the other hand, because of the data volume, treating the entire data set as an ensemble, and developing theoretical distributions for these ensembles provides a framework that plays instead to the strength of microarrays. We present theoretical results that under reasonable assumptions, the distribution of microarray intensities follows the Gamma model, with the biological interpretations of the model parameters emerging naturally. We subsequently establish that for each microarray data set, the fractional intensities can be represented as a mixture of Beta densities, and develop a procedure for using these results to draw statistical inference regarding differential gene expression. We illustrate the results with experimental data from gene expression studies on Deinococcus radiodurans following DNA damage using cDNA microarrays.
基因表达研究产生了大量的数据,其特征在于,要确定其表达谱的基因数量比可用的重复数量超出几个数量级。标准的点到点分析仍然试图根据可用的重复数量为每个基因提取有用的信息,因此利用了微阵列的弱点。另一方面,由于数据量很大,将整个数据集作为一个整体处理,并为这些整体开发理论分布,为微阵列的优势提供了一个框架。我们提出了理论结果,在合理的假设下,微阵列强度的分布遵循伽马模型,模型参数的生物学解释自然出现。随后,我们证明对于每个微阵列数据集,分数强度可以表示为 Beta 密度的混合,并且开发了一种使用这些结果对差异基因表达进行统计推断的方法。我们使用来自 Deinococcus radiodurans 基因表达研究的实验数据进行了说明,该研究使用 cDNA 微阵列研究了 DNA 损伤后的基因表达情况。