Nilsson Jens, Fioretos Thoas, Höglund Mattias, Fontes Magnus
Centre for Mathematical Sciences, Lund University, Box 118, SE-221 00 Lund, Sweden.
Bioinformatics. 2004 Apr 12;20(6):874-80. doi: 10.1093/bioinformatics/btg496. Epub 2004 Jan 29.
Genome-wide gene expression measurements, as currently determined by the microarray technology, can be represented mathematically as points in a high-dimensional gene expression space. Genes interact with each other in regulatory networks, restricting the cellular gene expression profiles to a certain manifold, or surface, in gene expression space. To obtain knowledge about this manifold, various dimensionality reduction methods and distance metrics are used. For data points distributed on curved manifolds, a sensible distance measure would be the geodesic distance along the manifold. In this work, we examine whether an approximate geodesic distance measure captures biological similarities better than the traditionally used Euclidean distance.
We computed approximate geodesic distances, determined by the Isomap algorithm, for one set of lymphoma and one set of lung cancer microarray samples. Compared with the ordinary Euclidean distance metric, this distance measure produced more instructive, biologically relevant, visualizations when applying multidimensional scaling. This suggests the Isomap algorithm as a promising tool for the interpretation of microarray data. Furthermore, the results demonstrate the benefit and importance of taking nonlinearities in gene expression data into account.
目前通过微阵列技术确定的全基因组基因表达测量值,在数学上可表示为高维基因表达空间中的点。基因在调控网络中相互作用,将细胞基因表达谱限制在基因表达空间中的某个流形或曲面上。为了获取有关此流形的知识,人们使用了各种降维方法和距离度量。对于分布在弯曲流形上的数据点,合理的距离度量应该是沿着流形的测地距离。在这项工作中,我们研究了近似测地距离度量是否比传统使用的欧几里得距离能更好地捕捉生物学相似性。
我们为一组淋巴瘤和一组肺癌微阵列样本计算了由等距映射(Isomap)算法确定的近似测地距离。与普通欧几里得距离度量相比,在应用多维缩放时,这种距离度量产生了更具启发性、与生物学相关的可视化结果。这表明等距映射算法是解释微阵列数据的一个有前景的工具。此外,结果证明了考虑基因表达数据中的非线性的益处和重要性。