使用非线性算法等距映射（Isomap）揭示了高密度寡核苷酸微阵列数据集中的样本表型簇。

Sample phenotype clusters in high-density oligonucleotide microarray data sets are revealed using Isomap, a nonlinear algorithm.

作者信息

Dawson Kevin, Rodriguez Raymond L, Malyj Wasyl

机构信息

Laboratory for High Performance Computing and Informatics, University of California, Davis MCB, One Shields Avenue, Davis, CA 95616, USA.

出版信息

BMC Bioinformatics. 2005 Aug 2;6:195. doi: 10.1186/1471-2105-6-195.

DOI:10.1186/1471-2105-6-195

PMID:16076401

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1189082/

Abstract

BACKGROUND

Life processes are determined by the organism's genetic profile and multiple environmental variables. However the interaction between these factors is inherently non-linear. Microarray data is one representation of the nonlinear interactions among genes and genes and environmental factors. Still most microarray studies use linear methods for the interpretation of nonlinear data. In this study, we apply Isomap, a nonlinear method of dimensionality reduction, to analyze three independent large Affymetrix high-density oligonucleotide microarray data sets.

RESULTS

Isomap discovered low-dimensional structures embedded in the Affymetrix microarray data sets. These structures correspond to and help to interpret biological phenomena present in the data. This analysis provides examples of temporal, spatial, and functional processes revealed by the Isomap algorithm. In a spinal cord injury data set, Isomap discovers the three main modalities of the experiment--location and severity of the injury and the time elapsed after the injury. In a multiple tissue data set, Isomap discovers a low-dimensional structure that corresponds to anatomical locations of the source tissues. This model is capable of describing low- and high-resolution differences in the same model, such as kidney-vs.-brain and differences between the nuclei of the amygdala, respectively. In a high-throughput drug screening data set, Isomap discovers the monocytic and granulocytic differentiation of myeloid cells and maps several chemical compounds on the two-dimensional model.

CONCLUSION

Visualization of Isomap models provides useful tools for exploratory analysis of microarray data sets. In most instances, Isomap models explain more of the variance present in the microarray data than PCA or MDS. Finally, Isomap is a promising new algorithm for class discovery and class prediction in high-density oligonucleotide data sets.

摘要

背景

生命过程由生物体的基因图谱和多个环境变量决定。然而，这些因素之间的相互作用本质上是非线性的。微阵列数据是基因与基因以及基因与环境因素之间非线性相互作用的一种表现形式。尽管如此，大多数微阵列研究仍使用线性方法来解释非线性数据。在本研究中，我们应用等距映射（Isomap），一种非线性降维方法，来分析三个独立的大型Affymetrix高密度寡核苷酸微阵列数据集。

结果

等距映射发现了嵌入在Affymetrix微阵列数据集中的低维结构。这些结构对应于并有助于解释数据中存在的生物学现象。该分析提供了等距映射算法揭示的时间、空间和功能过程的示例。在一个脊髓损伤数据集中，等距映射发现了实验的三个主要模式——损伤的位置和严重程度以及损伤后经过的时间。在一个多组织数据集中，等距映射发现了一个与源组织的解剖位置相对应的低维结构。该模型能够在同一模型中描述低分辨率和高分辨率差异，分别如肾脏与大脑的差异以及杏仁核不同核之间的差异。在一个高通量药物筛选数据集中，等距映射发现了髓样细胞的单核细胞和粒细胞分化，并在二维模型上绘制了几种化合物。

结论

等距映射模型的可视化提供了用于微阵列数据集探索性分析的有用工具。在大多数情况下，等距映射模型比主成分分析（PCA）或多维尺度分析（MDS）能解释更多微阵列数据中的方差。最后，等距映射是一种用于高密度寡核苷酸数据集中类别发现和类别预测的有前景的新算法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ff5/1189082/4497b4c911aa/1471-2105-6-195-2.jpg

相似文献

Sample phenotype clusters in high-density oligonucleotide microarray data sets are revealed using Isomap, a nonlinear algorithm.使用非线性算法等距映射（Isomap）揭示了高密度寡核苷酸微阵列数据集中的样本表型簇。

BMC Bioinformatics. 2005 Aug 2;6:195. doi: 10.1186/1471-2105-6-195.

Mining the structural knowledge of high-dimensional medical data using isomap.使用等距映射挖掘高维医学数据的结构知识。

Med Biol Eng Comput. 2005 May;43(3):410-2. doi: 10.1007/BF02345820.

Spectral embedding finds meaningful (relevant) structure in image and microarray data.谱嵌入可在图像和微阵列数据中找到有意义（相关）的结构。

BMC Bioinformatics. 2006 Feb 16;7:74. doi: 10.1186/1471-2105-7-74.

[Isomap-PLS nonlinear modeling method for near infrared spectroscopy].用于近红外光谱的等距映射偏最小二乘非线性建模方法

Guang Pu Xue Yu Guang Pu Fen Xi. 2009 Feb;29(2):322-6.

Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data.非监督降维技术在微阵列基因表达数据可视化中的比较研究。

BMC Bioinformatics. 2010 Nov 18;11:567. doi: 10.1186/1471-2105-11-567.

Gene expression analysis in clear cell renal cell carcinoma using gene set enrichment analysis for biostatistical management.基于基因集富集分析的 clear cell 肾细胞癌基因表达分析用于生物统计学管理。

BJU Int. 2011 Jul;108(2 Pt 2):E29-35. doi: 10.1111/j.1464-410X.2010.09794.x. Epub 2011 Mar 16.

Non-linear dimensionality reduction of signaling networks.信号网络的非线性降维

BMC Syst Biol. 2007 Jun 8;1:27. doi: 10.1186/1752-0509-1-27.

Nonlinear dimensionality reduction methods for synthetic biology biobricks' visualization.用于合成生物学生物砖可视化的非线性降维方法

BMC Bioinformatics. 2017 Jan 19;18(1):47. doi: 10.1186/s12859-017-1484-4.

Finding multiple coherent biclusters in microarray data using variable string length multiobjective genetic algorithm.使用可变字符串长度多目标遗传算法在微阵列数据中寻找多个相干双聚类

IEEE Trans Inf Technol Biomed. 2009 Nov;13(6):969-75. doi: 10.1109/TITB.2009.2017527. Epub 2009 Mar 16.

Many accurate small-discriminatory feature subsets exist in microarray transcript data: biomarker discovery.微阵列转录数据中存在许多准确的小判别特征子集：生物标志物发现。

BMC Bioinformatics. 2005 Apr 13;6:97. doi: 10.1186/1471-2105-6-97.

引用本文的文献

Unsupervised Algorithms for Microarray Sample Stratification.非监督算法在微阵列样本分层中的应用。

Methods Mol Biol. 2022;2401:121-146. doi: 10.1007/978-1-0716-1839-4_9.

Abundance and Expression of Shiga Toxin Genes in at the Recto-Anal Junction Relates to Host Immune Genes.在直肠-肛门交界处，志贺毒素基因的丰度和表达与宿主免疫基因有关。

Front Cell Infect Microbiol. 2021 Mar 17;11:633573. doi: 10.3389/fcimb.2021.633573. eCollection 2021.

Adaptive Dimensionality Reduction with Semi-Supervision (AdDReSS): Classifying Multi-Attribute Biomedical Data.具有半监督的自适应降维（AdDReSS）：对多属性生物医学数据进行分类

PLoS One. 2016 Jul 15;11(7):e0159088. doi: 10.1371/journal.pone.0159088. eCollection 2016.

A Review of Feature Selection and Feature Extraction Methods Applied on Microarray Data.应用于微阵列数据的特征选择与特征提取方法综述

Adv Bioinformatics. 2015;2015:198363. doi: 10.1155/2015/198363. Epub 2015 Jun 11.

An algorithm for finding biologically significant features in microarray data based on a priori manifold learning.一种基于先验流形学习在微阵列数据中寻找生物学显著特征的算法。

PLoS One. 2014 Mar 3;9(3):e90562. doi: 10.1371/journal.pone.0090562. eCollection 2014.

Endometrial gene expression profiling in pregnant Meishan and Yorkshire pigs on day 12 of gestation.妊娠第 12 天梅山猪和约克夏猪子宫内膜基因表达谱分析。

BMC Genomics. 2014 Feb 24;15:156. doi: 10.1186/1471-2164-15-156.

Consensus embedding: theory, algorithms and application to segmentation and classification of biomedical data.共识嵌入：理论、算法及其在生物医学数据分割和分类中的应用。

BMC Bioinformatics. 2012 Feb 8;13:26. doi: 10.1186/1471-2105-13-26.

BMC Bioinformatics. 2010 Nov 18;11:567. doi: 10.1186/1471-2105-11-567.

Investigating the efficacy of nonlinear dimensionality reduction schemes in classifying gene and protein expression studies.研究非线性降维方法在基因和蛋白质表达研究分类中的有效性。

IEEE/ACM Trans Comput Biol Bioinform. 2008 Jul-Sep;5(3):368-84. doi: 10.1109/TCBB.2008.36.

Complexity of type 2 diabetes mellitus data sets emerging from nutrigenomic research: a case for dimensionality reduction?营养基因组学研究中出现的2型糖尿病数据集的复杂性：降维的必要性？

Mutat Res. 2007 Sep 1;622(1-2):19-32. doi: 10.1016/j.mrfmmm.2007.02.033. Epub 2007 May 5.

本文引用的文献

The challenges of modeling mammalian biocomplexity.构建哺乳动物生物复杂性模型的挑战。

Nat Biotechnol. 2004 Oct;22(10):1268-74. doi: 10.1038/nbt1015.

Biologically valid linear factor models of gene expression.基因表达的生物学有效线性因子模型。

Bioinformatics. 2004 Nov 22;20(17):3021-33. doi: 10.1093/bioinformatics/bth354. Epub 2004 Jun 16.

Predicting gene expression from sequence.从序列预测基因表达。

Cell. 2004 Apr 16;117(2):185-98. doi: 10.1016/s0092-8674(04)00304-6.

Applications of a rat multiple tissue gene expression data set.大鼠多种组织基因表达数据集的应用。

Genome Res. 2004 Apr;14(4):742-9. doi: 10.1101/gr.2161804.

Gene expression-based high-throughput screening(GE-HTS) and application to leukemia differentiation.基于基因表达的高通量筛选（GE-HTS）及其在白血病分化中的应用。

Nat Genet. 2004 Mar;36(3):257-63. doi: 10.1038/ng1305. Epub 2004 Feb 8.

Approximate geodesic distances reveal biologically relevant structures in microarray data.近似测地距离揭示了微阵列数据中的生物学相关结构。

Bioinformatics. 2004 Apr 12;20(6):874-80. doi: 10.1093/bioinformatics/btg496. Epub 2004 Jan 29.

Tumor classification and marker gene prediction by feature selection and fuzzy c-means clustering using microarray data.利用微阵列数据通过特征选择和模糊c均值聚类进行肿瘤分类和标记基因预测。

BMC Bioinformatics. 2003 Dec 2;4:60. doi: 10.1186/1471-2105-4-60.

Robust singular value decomposition analysis of microarray data.微阵列数据的稳健奇异值分解分析

Proc Natl Acad Sci U S A. 2003 Nov 11;100(23):13167-72. doi: 10.1073/pnas.1733249100. Epub 2003 Oct 27.

A gene-coexpression network for global discovery of conserved genetic modules.用于全面发现保守遗传模块的基因共表达网络。

Science. 2003 Oct 10;302(5643):249-55. doi: 10.1126/science.1087447. Epub 2003 Aug 21.

Exploration, normalization, and summaries of high density oligonucleotide array probe level data.高密度寡核苷酸阵列探针水平数据的探索、标准化及汇总

Biostatistics. 2003 Apr;4(2):249-64. doi: 10.1093/biostatistics/4.2.249.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用非线性算法等距映射（Isomap）揭示了高密度寡核苷酸微阵列数据集中的样本表型簇。

Sample phenotype clusters in high-density oligonucleotide microarray data sets are revealed using Isomap, a nonlinear algorithm.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献