AutoSOME：一种无需事先了解聚类数目的基因表达模块识别聚类方法。

AutoSOME: a clustering method for identifying gene expression modules without prior knowledge of cluster number.

机构信息

Biomolecular Science and Engineering Program, University of California, Santa Barbara, CA 93106, USA.

出版信息

BMC Bioinformatics. 2010 Mar 4;11:117. doi: 10.1186/1471-2105-11-117.

DOI:10.1186/1471-2105-11-117

PMID:20202218

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2846907/

Abstract

BACKGROUND

Clustering the information content of large high-dimensional gene expression datasets has widespread application in "omics" biology. Unfortunately, the underlying structure of these natural datasets is often fuzzy, and the computational identification of data clusters generally requires knowledge about cluster number and geometry.

RESULTS

We integrated strategies from machine learning, cartography, and graph theory into a new informatics method for automatically clustering self-organizing map ensembles of high-dimensional data. Our new method, called AutoSOME, readily identifies discrete and fuzzy data clusters without prior knowledge of cluster number or structure in diverse datasets including whole genome microarray data. Visualization of AutoSOME output using network diagrams and differential heat maps reveals unexpected variation among well-characterized cancer cell lines. Co-expression analysis of data from human embryonic and induced pluripotent stem cells using AutoSOME identifies >3400 up-regulated genes associated with pluripotency, and indicates that a recently identified protein-protein interaction network characterizing pluripotency was underestimated by a factor of four.

CONCLUSIONS

By effectively extracting important information from high-dimensional microarray data without prior knowledge or the need for data filtration, AutoSOME can yield systems-level insights from whole genome microarray expression studies. Due to its generality, this new method should also have practical utility for a variety of data-intensive applications, including the results of deep sequencing experiments. AutoSOME is available for download at http://jimcooperlab.mcdb.ucsb.edu/autosome webcite.

摘要

背景

将大型高维基因表达数据集的信息内容进行聚类在“组学”生物学中具有广泛的应用。不幸的是，这些自然数据集的底层结构通常是模糊的，并且数据聚类的计算识别通常需要关于聚类数量和几何形状的知识。

结果

我们将机器学习、制图学和图论的策略集成到一种新的信息学方法中，用于自动聚类高维数据的自组织映射集合。我们的新方法称为 AutoSOME，无需事先了解聚类数量或结构，即可轻松识别离散和模糊的数据聚类，适用于包括全基因组微阵列数据在内的各种数据集。使用网络图和差异热图可视化 AutoSOME 输出，可以揭示出特征明确的癌细胞系之间出乎意料的变化。使用 AutoSOME 对人类胚胎和诱导多能干细胞的数据进行共表达分析，鉴定出 >3400 个与多能性相关的上调基因，并表明最近确定的一个描述多能性的蛋白质-蛋白质相互作用网络被低估了四倍。

结论

通过在无需先验知识或数据过滤的情况下从高维微阵列数据中有效提取重要信息，AutoSOME 可以从全基因组微阵列表达研究中获得系统水平的见解。由于其通用性，这种新方法也应该对各种数据密集型应用具有实际的实用价值，包括深度测序实验的结果。AutoSOME 可在 http://jimcooperlab.mcdb.ucsb.edu/autosome 上下载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cf1/2846907/6cc906e68505/1471-2105-11-117-1.jpg

相似文献

AutoSOME: a clustering method for identifying gene expression modules without prior knowledge of cluster number.AutoSOME：一种无需事先了解聚类数目的基因表达模块识别聚类方法。

BMC Bioinformatics. 2010 Mar 4;11:117. doi: 10.1186/1471-2105-11-117.

Identifying stem cell gene expression patterns and phenotypic networks with AutoSOME.使用AutoSOME识别干细胞基因表达模式和表型网络。

Methods Mol Biol. 2014;1150:115-30. doi: 10.1007/978-1-4939-0512-6_6.

Detecting clusters of different geometrical shapes in microarray gene expression data.在微阵列基因表达数据中检测不同几何形状的聚类。

Bioinformatics. 2005 May 1;21(9):1927-34. doi: 10.1093/bioinformatics/bti251. Epub 2005 Jan 12.

Fuzzy c-means clustering with prior biological knowledge.具有先验生物学知识的模糊c均值聚类

J Biomed Inform. 2009 Feb;42(1):74-81. doi: 10.1016/j.jbi.2008.05.009. Epub 2008 May 24.

Knowledge-assisted recognition of cluster boundaries in gene expression data.基因表达数据中聚类边界的知识辅助识别。

Artif Intell Med. 2005 Sep-Oct;35(1-2):171-83. doi: 10.1016/j.artmed.2005.02.007.

Clustering microarray gene expression data using weighted Chinese restaurant process.使用加权中国餐馆过程对微阵列基因表达数据进行聚类

Bioinformatics. 2006 Aug 15;22(16):1988-97. doi: 10.1093/bioinformatics/btl284. Epub 2006 Jun 9.

Beyond synexpression relationships: local clustering of time-shifted and inverted gene expression profiles identifies new, biologically relevant interactions.超越共表达关系：时移和反向基因表达谱的局部聚类可识别新的生物学相关相互作用。

J Mol Biol. 2001 Dec 14;314(5):1053-66. doi: 10.1006/jmbi.2000.5219.

Analysis of a Gibbs sampler method for model-based clustering of gene expression data.一种基于模型的基因表达数据聚类的吉布斯采样器方法分析。

Bioinformatics. 2008 Jan 15;24(2):176-83. doi: 10.1093/bioinformatics/btm562. Epub 2007 Nov 22.

Gene expression data clustering using a multiobjective symmetry based clustering technique.基于多目标对称的基因表达数据聚类技术。

Comput Biol Med. 2013 Nov;43(11):1965-77. doi: 10.1016/j.compbiomed.2013.07.021. Epub 2013 Sep 7.

Gene regulatory network clustering for graph layout based on microarray gene expression data.基于微阵列基因表达数据的用于图形布局的基因调控网络聚类

Genome Inform. 2010;24:84-95.

引用本文的文献

Hypothesis generation for rare and undiagnosed diseases through clustering and classifying time-versioned biological ontologies.通过对具有时间版本的生物本体进行聚类和分类，生成针对罕见病和未确诊疾病的假设。

PLoS One. 2024 Dec 26;19(12):e0309205. doi: 10.1371/journal.pone.0309205. eCollection 2024.

Decoding the universal human chromatin landscape through teratoma-based profiling.通过基于畸胎瘤的分析解码通用的人类染色质景观。

Nucleic Acids Res. 2024 Apr 24;52(7):3589-3606. doi: 10.1093/nar/gkae021.

clusterMaker2: a major update to clusterMaker, a multi-algorithm clustering app for Cytoscape.clusterMaker2：clusterMaker 的一个主要更新，clusterMaker 是 Cytoscape 的一个多算法聚类应用程序。

BMC Bioinformatics. 2023 Apr 5;24(1):134. doi: 10.1186/s12859-023-05225-z.

Xeno- and Feeder-Free Differentiation of Human iPSCs to Trabecular Meshwork-Like Cells by Recombinant Cytokines.重组细胞因子体外诱导人诱导多能干细胞向小梁网样细胞分化：无饲养层和动物来源成分。

Transl Vis Sci Technol. 2021 May 3;10(6):27. doi: 10.1167/tvst.10.6.27.

Human-induced pluripotent stem cells for modelling metabolic perturbations and impaired bioenergetics underlying cardiomyopathies.用于模拟代谢紊乱和心肌病中生物能量障碍的人诱导多能干细胞。

Cardiovasc Res. 2021 Feb 22;117(3):694-711. doi: 10.1093/cvr/cvaa125.

Contribution of H3K4 demethylase KDM5B to nucleosome organization in embryonic stem cells revealed by micrococcal nuclease sequencing.通过微球菌核酸酶测序揭示 H3K4 去甲基酶 KDM5B 对胚胎干细胞核小体组织的贡献。

Epigenetics Chromatin. 2019 Apr 2;12(1):20. doi: 10.1186/s13072-019-0266-9.

Culture of haploid blastocysts in FGF4 favors the derivation of epiblast stem cells with a primed epigenetic and transcriptional landscape.在 FGF4 中培养单倍体囊胚有利于获得具有初始表观遗传和转录景观的内细胞团干细胞。

Sci Rep. 2018 Jul 17;8(1):10775. doi: 10.1038/s41598-018-29074-6.

Bipartite graphs in systems biology and medicine: a survey of methods and applications.系统生物学和医学中的二部图：方法和应用综述。

Gigascience. 2018 Apr 1;7(4):1-31. doi: 10.1093/gigascience/giy014.

Review of Statistical Learning Methods in Integrated Omics Studies (An Integrated Information Science).综合组学研究中的统计学习方法综述（一门综合信息科学）

Bioinform Biol Insights. 2018 Feb 20;12:1177932218759292. doi: 10.1177/1177932218759292. eCollection 2018.

OCT4 supports extended LIF-independent self-renewal and maintenance of transcriptional and epigenetic networks in embryonic stem cells.OCT4 支持延长 LIF 独立的自我更新，并维持胚胎干细胞中的转录和表观遗传网络。

Sci Rep. 2017 Nov 27;7(1):16360. doi: 10.1038/s41598-017-16611-y.

本文引用的文献

MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering.MULTI-K：使用集成 k-均值聚类进行微阵列亚型的准确分类。

BMC Bioinformatics. 2009 Aug 22;10:260. doi: 10.1186/1471-2105-10-260.

Signed weighted gene co-expression network analysis of transcriptional regulation in murine embryonic stem cells.小鼠胚胎干细胞转录调控的加权基因共表达网络分析

BMC Genomics. 2009 Jul 20;10:327. doi: 10.1186/1471-2164-10-327.

Induced pluripotent stem cells and embryonic stem cells are distinguished by gene expression signatures.诱导多能干细胞和胚胎干细胞通过基因表达特征来区分。

Cell Stem Cell. 2009 Jul 2;5(1):111-23. doi: 10.1016/j.stem.2009.06.008.

Prestige centrality-based functional outlier detection in gene expression analysis.基因表达分析中基于声望中心性的功能异常值检测

Bioinformatics. 2009 Sep 1;25(17):2222-8. doi: 10.1093/bioinformatics/btp388. Epub 2009 Jun 23.

Heterogeneity of pluripotent marker gene expression in colonies generated in human iPS cell induction culture.人诱导多能干细胞诱导培养中形成的集落里多能性标记基因表达的异质性

Stem Cell Res. 2007 Nov;1(2):105-15. doi: 10.1016/j.scr.2008.01.001. Epub 2008 Jan 31.

Human induced pluripotent stem cells free of vector and transgene sequences.无载体和转基因序列的人诱导多能干细胞

Science. 2009 May 8;324(5928):797-801. doi: 10.1126/science.1172482. Epub 2009 Mar 26.

Parkinson's disease patient-derived induced pluripotent stem cells free of viral reprogramming factors.不含病毒重编程因子的帕金森病患者来源诱导多能干细胞

Cell. 2009 Mar 6;136(5):964-77. doi: 10.1016/j.cell.2009.02.013.

A roadmap of clustering algorithms: finding a match for a biomedical application.聚类算法路线图：寻找适合生物医学应用的方法。

Brief Bioinform. 2009 May;10(3):297-314. doi: 10.1093/bib/bbn058. Epub 2009 Feb 24.

Induced pluripotent stem cells from a spinal muscular atrophy patient.来自一名脊髓性肌萎缩症患者的诱导多能干细胞。

Nature. 2009 Jan 15;457(7227):277-80. doi: 10.1038/nature07677. Epub 2008 Dec 21.

Clustering cancer gene expression data: a comparative study.癌症基因表达数据聚类：一项比较研究。

BMC Bioinformatics. 2008 Nov 27;9:497. doi: 10.1186/1471-2105-9-497.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

AutoSOME：一种无需事先了解聚类数目的基因表达模块识别聚类方法。

AutoSOME: a clustering method for identifying gene expression modules without prior knowledge of cluster number.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献