Department of Biosystems Data Analysis, University of Amsterdam, The Netherlands.
Brief Bioinform. 2012 Sep;13(5):524-35. doi: 10.1093/bib/bbr071. Epub 2011 Dec 23.
In functional genomics it is more rule than exception that experimental designs are used to generate the data. The samples of the resulting data sets are thus organized according to this design and for each sample many biochemical compounds are measured, e.g. typically thousands of gene-expressions or hundreds of metabolites. This results in high-dimensional data sets with an underlying experimental design. Several methods have recently become available for analyzing such data while utilizing the underlying design. We review these methods by putting them in a unifying and general framework to facilitate understanding the (dis-)similarities between the methods. The biological question dictates which method to use and the framework allows for building new methods to accommodate a range of such biological questions. The framework is built on well known fixed-effect ANOVA models and subsequent dimension reduction. We present the framework both in matrix algebra as well as in more insightful geometrical terms. We show the workings of the different special cases of our framework with a real-life metabolomics example from nutritional research and a gene-expression example from the field of virology.
在功能基因组学中,实验设计被用来生成数据是更为常见的规则,而非例外。因此,由此产生的数据集的样本是根据该设计进行组织的,并且针对每个样本测量了许多生化化合物,例如通常是数千个基因表达或数百种代谢物。这导致了具有潜在实验设计的高维数据集。最近已经有几种方法可用于分析此类数据,同时利用潜在的设计。我们通过将它们置于统一且通用的框架中,来回顾这些方法,以方便理解方法之间的(差异)相似性。生物学问题决定了要使用哪种方法,并且该框架允许构建新的方法以适应一系列此类生物学问题。该框架建立在众所周知的固定效应 ANOVA 模型和随后的降维基础上。我们以矩阵代数以及更具洞察力的几何术语呈现了该框架。我们通过来自营养研究的代谢组学实例和病毒学领域的基因表达实例,展示了我们框架的不同特殊情况的工作原理。