Suppr超能文献

谱嵌入可在图像和微阵列数据中找到有意义(相关)的结构。

Spectral embedding finds meaningful (relevant) structure in image and microarray data.

作者信息

Higgs Brandon W, Weller Jennifer, Solka Jeffrey L

机构信息

School of Computational Sciences, George Mason University, Manassas, VA 20110, USA.

出版信息

BMC Bioinformatics. 2006 Feb 16;7:74. doi: 10.1186/1471-2105-7-74.

Abstract

BACKGROUND

Accurate methods for extraction of meaningful patterns in high dimensional data have become increasingly important with the recent generation of data types containing measurements across thousands of variables. Principal components analysis (PCA) is a linear dimensionality reduction (DR) method that is unsupervised in that it relies only on the data; projections are calculated in Euclidean or a similar linear space and do not use tuning parameters for optimizing the fit to the data. However, relationships within sets of nonlinear data types, such as biological networks or images, are frequently mis-rendered into a low dimensional space by linear methods. Nonlinear methods, in contrast, attempt to model important aspects of the underlying data structure, often requiring parameter(s) fitting to the data type of interest. In many cases, the optimal parameter values vary when different classification algorithms are applied on the same rendered subspace, making the results of such methods highly dependent upon the type of classifier implemented.

RESULTS

We present the results of applying the spectral method of Lafon, a nonlinear DR method based on the weighted graph Laplacian, that minimizes the requirements for such parameter optimization for two biological data types. We demonstrate that it is successful in determining implicit ordering of brain slice image data and in classifying separate species in microarray data, as compared to two conventional linear methods and three nonlinear methods (one of which is an alternative spectral method). This spectral implementation is shown to provide more meaningful information, by preserving important relationships, than the methods of DR presented for comparison. Tuning parameter fitting is simple and is a general, rather than data type or experiment specific approach, for the two datasets analyzed here. Tuning parameter optimization is minimized in the DR step to each subsequent classification method, enabling the possibility of valid cross-experiment comparisons.

CONCLUSION

Results from the spectral method presented here exhibit the desirable properties of preserving meaningful nonlinear relationships in lower dimensional space and requiring minimal parameter fitting, providing a useful algorithm for purposes of visualization and classification across diverse datasets, a common challenge in systems biology.

摘要

背景

随着近期生成的包含数千个变量测量值的数据类型的出现,用于在高维数据中提取有意义模式的准确方法变得越来越重要。主成分分析(PCA)是一种线性降维(DR)方法,它是无监督的,仅依赖于数据;投影是在欧几里得或类似的线性空间中计算的,并且不使用调整参数来优化对数据的拟合。然而,非线性数据类型集(如生物网络或图像)中的关系经常被线性方法错误地映射到低维空间中。相比之下,非线性方法试图对基础数据结构的重要方面进行建模,通常需要对感兴趣的数据类型进行参数拟合。在许多情况下,当在相同的映射子空间上应用不同的分类算法时,最佳参数值会有所不同,使得这些方法的结果高度依赖于所实现的分类器类型。

结果

我们展示了应用Lafon的谱方法的结果,这是一种基于加权图拉普拉斯算子的非线性DR方法,它将两种生物数据类型的此类参数优化要求降至最低。与两种传统线性方法和三种非线性方法(其中一种是替代谱方法)相比,我们证明它成功地确定了脑切片图像数据的隐式排序并对微阵列数据中的不同物种进行了分类。通过保留重要关系,这种谱实现显示出比用于比较的DR方法提供了更有意义的信息。对于此处分析的两个数据集,调整参数拟合很简单,并且是一种通用的方法,而不是特定于数据类型或实验的方法。在DR步骤中,针对每个后续分类方法将调整参数优化降至最低,从而实现有效的跨实验比较。

结论

本文提出的谱方法的结果展现出在低维空间中保留有意义的非线性关系以及需要最少参数拟合的理想特性,为跨不同数据集进行可视化和分类提供了一种有用的算法,这是系统生物学中的一个常见挑战。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11eb/1395341/6ccfd26e9f97/1471-2105-7-74-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验