谱嵌入可在图像和微阵列数据中找到有意义（相关）的结构。

Spectral embedding finds meaningful (relevant) structure in image and microarray data.

作者信息

Higgs Brandon W, Weller Jennifer, Solka Jeffrey L

机构信息

School of Computational Sciences, George Mason University, Manassas, VA 20110, USA.

出版信息

BMC Bioinformatics. 2006 Feb 16;7:74. doi: 10.1186/1471-2105-7-74.

DOI:10.1186/1471-2105-7-74

PMID:16483359

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1395341/

Abstract

BACKGROUND

Accurate methods for extraction of meaningful patterns in high dimensional data have become increasingly important with the recent generation of data types containing measurements across thousands of variables. Principal components analysis (PCA) is a linear dimensionality reduction (DR) method that is unsupervised in that it relies only on the data; projections are calculated in Euclidean or a similar linear space and do not use tuning parameters for optimizing the fit to the data. However, relationships within sets of nonlinear data types, such as biological networks or images, are frequently mis-rendered into a low dimensional space by linear methods. Nonlinear methods, in contrast, attempt to model important aspects of the underlying data structure, often requiring parameter(s) fitting to the data type of interest. In many cases, the optimal parameter values vary when different classification algorithms are applied on the same rendered subspace, making the results of such methods highly dependent upon the type of classifier implemented.

RESULTS

We present the results of applying the spectral method of Lafon, a nonlinear DR method based on the weighted graph Laplacian, that minimizes the requirements for such parameter optimization for two biological data types. We demonstrate that it is successful in determining implicit ordering of brain slice image data and in classifying separate species in microarray data, as compared to two conventional linear methods and three nonlinear methods (one of which is an alternative spectral method). This spectral implementation is shown to provide more meaningful information, by preserving important relationships, than the methods of DR presented for comparison. Tuning parameter fitting is simple and is a general, rather than data type or experiment specific approach, for the two datasets analyzed here. Tuning parameter optimization is minimized in the DR step to each subsequent classification method, enabling the possibility of valid cross-experiment comparisons.

CONCLUSION

Results from the spectral method presented here exhibit the desirable properties of preserving meaningful nonlinear relationships in lower dimensional space and requiring minimal parameter fitting, providing a useful algorithm for purposes of visualization and classification across diverse datasets, a common challenge in systems biology.

摘要

背景

随着近期生成的包含数千个变量测量值的数据类型的出现，用于在高维数据中提取有意义模式的准确方法变得越来越重要。主成分分析（PCA）是一种线性降维（DR）方法，它是无监督的，仅依赖于数据；投影是在欧几里得或类似的线性空间中计算的，并且不使用调整参数来优化对数据的拟合。然而，非线性数据类型集（如生物网络或图像）中的关系经常被线性方法错误地映射到低维空间中。相比之下，非线性方法试图对基础数据结构的重要方面进行建模，通常需要对感兴趣的数据类型进行参数拟合。在许多情况下，当在相同的映射子空间上应用不同的分类算法时，最佳参数值会有所不同，使得这些方法的结果高度依赖于所实现的分类器类型。

结果

我们展示了应用Lafon的谱方法的结果，这是一种基于加权图拉普拉斯算子的非线性DR方法，它将两种生物数据类型的此类参数优化要求降至最低。与两种传统线性方法和三种非线性方法（其中一种是替代谱方法）相比，我们证明它成功地确定了脑切片图像数据的隐式排序并对微阵列数据中的不同物种进行了分类。通过保留重要关系，这种谱实现显示出比用于比较的DR方法提供了更有意义的信息。对于此处分析的两个数据集，调整参数拟合很简单，并且是一种通用的方法，而不是特定于数据类型或实验的方法。在DR步骤中，针对每个后续分类方法将调整参数优化降至最低，从而实现有效的跨实验比较。

结论

本文提出的谱方法的结果展现出在低维空间中保留有意义的非线性关系以及需要最少参数拟合的理想特性，为跨不同数据集进行可视化和分类提供了一种有用的算法，这是系统生物学中的一个常见挑战。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11eb/1395341/6ccfd26e9f97/1471-2105-7-74-1.jpg

相似文献

Spectral embedding finds meaningful (relevant) structure in image and microarray data.

BMC Bioinformatics. 2006 Feb 16;7:74. doi: 10.1186/1471-2105-7-74.

Visualization methods for statistical analysis of microarray clusters.

BMC Bioinformatics. 2005 May 12;6:115. doi: 10.1186/1471-2105-6-115.

Feature selection and nearest centroid classification for protein mass spectrometry.

BMC Bioinformatics. 2005 Mar 23;6:68. doi: 10.1186/1471-2105-6-68.

Quadratic regression analysis for gene discovery and pattern recognition for non-cyclic short time-course microarray experiments.

BMC Bioinformatics. 2005 Apr 25;6:106. doi: 10.1186/1471-2105-6-106.

On mining micro-array data by Order-Preserving Submatrix.

Int J Bioinform Res Appl. 2007;3(1):42-64. doi: 10.1504/IJBRA.2007.011834.

Principal surfaces from unsupervised kernel regression.

IEEE Trans Pattern Anal Mach Intell. 2005 Sep;27(9):1379-91. doi: 10.1109/TPAMI.2005.183.

Gene expression data classification using locally linear discriminant embedding.

Comput Biol Med. 2010 Oct;40(10):802-10. doi: 10.1016/j.compbiomed.2010.08.003. Epub 2010 Sep 22.

Exploring nonlinear feature space dimension reduction and data representation in breast Cadx with Laplacian eigenmaps and t-SNE.

Med Phys. 2010 Jan;37(1):339-51. doi: 10.1118/1.3267037.

Sample phenotype clusters in high-density oligonucleotide microarray data sets are revealed using Isomap, a nonlinear algorithm.

BMC Bioinformatics. 2005 Aug 2;6:195. doi: 10.1186/1471-2105-6-195.

Principal geodesic analysis for the study of nonlinear statistics of shape.

IEEE Trans Med Imaging. 2004 Aug;23(8):995-1005. doi: 10.1109/TMI.2004.831793.

引用本文的文献

Dimensionality reduction-based fusion approaches for imaging and non-imaging biomedical data: concepts, workflow, and use-cases.

BMC Med Imaging. 2017 Jan 5;17(1):2. doi: 10.1186/s12880-016-0172-6.

Content-based image retrieval of digitized histopathology in boosted spectrally embedded spaces.

J Pathol Inform. 2015 Jun 29;6:41. doi: 10.4103/2153-3539.159441. eCollection 2015.

A method for processing multivariate data in medical studies.

Stat Med. 2013 Sep 10;32(20):3436-48. doi: 10.1002/sim.5788. Epub 2013 Mar 31.

A white-box approach to microarray probe response characterization: the BaFL pipeline.

BMC Bioinformatics. 2009 Dec 29;10:449. doi: 10.1186/1471-2105-10-449.

Non-linear dimensionality reduction of signaling networks.

BMC Syst Biol. 2007 Jun 8;1:27. doi: 10.1186/1752-0509-1-27.

本文引用的文献

Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps.

Proc Natl Acad Sci U S A. 2005 May 24;102(21):7426-31. doi: 10.1073/pnas.0500334102. Epub 2005 May 17.

Geometric diffusions as a tool for harmonic analysis and structure definition of data: multiscale methods.

Proc Natl Acad Sci U S A. 2005 May 24;102(21):7432-7. doi: 10.1073/pnas.0500896102. Epub 2005 May 17.

Mapping high-dimensional data onto a relative distance plane--an exact method for visualizing and characterizing high-dimensional patterns.

J Biomed Inform. 2004 Oct;37(5):366-79. doi: 10.1016/j.jbi.2004.07.005.

Approximate geodesic distances reveal biologically relevant structures in microarray data.

Bioinformatics. 2004 Apr 12;20(6):874-80. doi: 10.1093/bioinformatics/btg496. Epub 2004 Jan 29.

Comparative analysis of gene-expression patterns in human and African great ape cultured fibroblasts.

Genome Res. 2003 Jul;13(7):1619-30. doi: 10.1101/gr.1289803.

Nonlinear dimensionality reduction by locally linear embedding.

Science. 2000 Dec 22;290(5500):2323-6. doi: 10.1126/science.290.5500.2323.

A global geometric framework for nonlinear dimensionality reduction.

Science. 2000 Dec 22;290(5500):2319-23. doi: 10.1126/science.290.5500.2319.

Cluster analysis and display of genome-wide expression patterns.

Proc Natl Acad Sci U S A. 1998 Dec 8;95(25):14863-8. doi: 10.1073/pnas.95.25.14863.

Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization.

Mol Biol Cell. 1998 Dec;9(12):3273-97. doi: 10.1091/mbc.9.12.3273.

A genome-wide transcriptional analysis of the mitotic cell cycle.

Mol Cell. 1998 Jul;2(1):65-73. doi: 10.1016/s1097-2765(00)80114-8.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

谱嵌入可在图像和微阵列数据中找到有意义（相关）的结构。

Spectral embedding finds meaningful (relevant) structure in image and microarray data.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献