Dawson Kevin, Rodriguez Raymond L, Malyj Wasyl
Laboratory for High Performance Computing and Informatics, University of California, Davis MCB, One Shields Avenue, Davis, CA 95616, USA.
BMC Bioinformatics. 2005 Aug 2;6:195. doi: 10.1186/1471-2105-6-195.
Life processes are determined by the organism's genetic profile and multiple environmental variables. However the interaction between these factors is inherently non-linear. Microarray data is one representation of the nonlinear interactions among genes and genes and environmental factors. Still most microarray studies use linear methods for the interpretation of nonlinear data. In this study, we apply Isomap, a nonlinear method of dimensionality reduction, to analyze three independent large Affymetrix high-density oligonucleotide microarray data sets.
Isomap discovered low-dimensional structures embedded in the Affymetrix microarray data sets. These structures correspond to and help to interpret biological phenomena present in the data. This analysis provides examples of temporal, spatial, and functional processes revealed by the Isomap algorithm. In a spinal cord injury data set, Isomap discovers the three main modalities of the experiment--location and severity of the injury and the time elapsed after the injury. In a multiple tissue data set, Isomap discovers a low-dimensional structure that corresponds to anatomical locations of the source tissues. This model is capable of describing low- and high-resolution differences in the same model, such as kidney-vs.-brain and differences between the nuclei of the amygdala, respectively. In a high-throughput drug screening data set, Isomap discovers the monocytic and granulocytic differentiation of myeloid cells and maps several chemical compounds on the two-dimensional model.
Visualization of Isomap models provides useful tools for exploratory analysis of microarray data sets. In most instances, Isomap models explain more of the variance present in the microarray data than PCA or MDS. Finally, Isomap is a promising new algorithm for class discovery and class prediction in high-density oligonucleotide data sets.
生命过程由生物体的基因图谱和多个环境变量决定。然而,这些因素之间的相互作用本质上是非线性的。微阵列数据是基因与基因以及基因与环境因素之间非线性相互作用的一种表现形式。尽管如此,大多数微阵列研究仍使用线性方法来解释非线性数据。在本研究中,我们应用等距映射(Isomap),一种非线性降维方法,来分析三个独立的大型Affymetrix高密度寡核苷酸微阵列数据集。
等距映射发现了嵌入在Affymetrix微阵列数据集中的低维结构。这些结构对应于并有助于解释数据中存在的生物学现象。该分析提供了等距映射算法揭示的时间、空间和功能过程的示例。在一个脊髓损伤数据集中,等距映射发现了实验的三个主要模式——损伤的位置和严重程度以及损伤后经过的时间。在一个多组织数据集中,等距映射发现了一个与源组织的解剖位置相对应的低维结构。该模型能够在同一模型中描述低分辨率和高分辨率差异,分别如肾脏与大脑的差异以及杏仁核不同核之间的差异。在一个高通量药物筛选数据集中,等距映射发现了髓样细胞的单核细胞和粒细胞分化,并在二维模型上绘制了几种化合物。
等距映射模型的可视化提供了用于微阵列数据集探索性分析的有用工具。在大多数情况下,等距映射模型比主成分分析(PCA)或多维尺度分析(MDS)能解释更多微阵列数据中的方差。最后,等距映射是一种用于高密度寡核苷酸数据集中类别发现和类别预测的有前景的新算法。