应用于分子生物学的聚类概述。

An overview of clustering applied to molecular biology.

作者信息

Nugent Rebecca, Meila Marina

机构信息

Department of Statistics, Carnegie Mellon University, Pittsburgh, PA, USA.

出版信息

Methods Mol Biol. 2010;620:369-404. doi: 10.1007/978-1-60761-580-4_12.

DOI:10.1007/978-1-60761-580-4_12

PMID:20652512

Abstract

In molecular biology, we are often interested in determining the group structure in, e.g., a population of cells or microarray gene expression data. Clustering methods identify groups of similar observations, but the results can depend on the chosen method's assumptions and starting parameter values. In this chapter, we give a broad overview of both attribute- and similarity-based clustering, describing both the methods and their performance. The parametric and nonparametric approaches presented vary in whether or not they require knowing the number of clusters in advance as well as the shapes of the estimated clusters. Additionally, we include a biclustering algorithm that incorporates variable selection into the clustering procedure. We finish with a discussion of some common methods for comparing two clustering solutions (possibly from different methods). The user is advised to devote time and attention to determining the appropriate clustering approach (and any corresponding parameter values) for the specific application prior to analysis.

摘要

在分子生物学中，我们常常对确定例如一群细胞或微阵列基因表达数据中的组结构感兴趣。聚类方法可识别相似观测值的组，但结果可能取决于所选方法的假设和起始参数值。在本章中，我们对基于属性和基于相似性的聚类进行了广泛概述，描述了这些方法及其性能。所介绍的参数化和非参数化方法在是否需要预先知道聚类数量以及估计聚类的形状方面有所不同。此外，我们还纳入了一种双聚类算法，该算法将变量选择纳入聚类过程。最后，我们讨论了一些用于比较两个聚类解决方案（可能来自不同方法）的常用方法。建议用户在分析之前花时间并仔细考虑为特定应用确定合适的聚类方法（以及任何相应的参数值）。

相似文献

An overview of clustering applied to molecular biology.

Methods Mol Biol. 2010;620:369-404. doi: 10.1007/978-1-60761-580-4_12.

Detecting clusters of different geometrical shapes in microarray gene expression data.

Bioinformatics. 2005 May 1;21(9):1927-34. doi: 10.1093/bioinformatics/bti251. Epub 2005 Jan 12.

Computing the maximum similarity bi-clusters of gene expression data.

Bioinformatics. 2007 Jan 1;23(1):50-6. doi: 10.1093/bioinformatics/btl560. Epub 2006 Nov 7.

Average correlation clustering algorithm (ACCA) for grouping of co-regulated genes with similar pattern of variation in their expression values.

J Biomed Inform. 2010 Aug;43(4):560-8. doi: 10.1016/j.jbi.2010.02.001. Epub 2010 Feb 6.

Overview on techniques in cluster analysis.

Methods Mol Biol. 2010;593:81-107. doi: 10.1007/978-1-60327-194-3_5.

Knowledge based cluster ensemble for cancer discovery from biomolecular data.

IEEE Trans Nanobioscience. 2011 Jun;10(2):76-85. doi: 10.1109/TNB.2011.2144997. Epub 2011 Jul 7.

Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm.

Bioinformatics. 2006 Jan 1;22(1):58-67. doi: 10.1093/bioinformatics/bti746. Epub 2005 Oct 27.

Towards clustering of incomplete microarray data without the use of imputation.

Bioinformatics. 2007 Jan 1;23(1):107-13. doi: 10.1093/bioinformatics/btl555. Epub 2006 Oct 31.

Comparing algorithms for clustering of expression data: how to assess gene clusters.

Methods Mol Biol. 2009;541:479-509. doi: 10.1007/978-1-59745-243-4_21.

Model-based clustering on the unit sphere with an illustration using gene expression profiles.

Biostatistics. 2008 Jan;9(1):66-80. doi: 10.1093/biostatistics/kxm012. Epub 2007 Apr 27.

引用本文的文献

A machine learning model and identification of immune infiltration for chronic obstructive pulmonary disease based on disulfidptosis-related genes.

BMC Med Genomics. 2025 Jan 8;18(1):7. doi: 10.1186/s12920-024-02076-2.

Piikun: an information theoretic toolkit for analysis and visualization of species delimitation metric space.

BMC Bioinformatics. 2024 Dec 18;25(1):385. doi: 10.1186/s12859-024-05997-y.

Starling: Introducing a mesoscopic scale with Confluence for Graph Clustering.

PLoS One. 2023 Aug 24;18(8):e0290090. doi: 10.1371/journal.pone.0290090. eCollection 2023.

Developmental patterning of peptide transcription in the central circadian clock in both sexes.

Front Neurosci. 2023 May 19;17:1177458. doi: 10.3389/fnins.2023.1177458. eCollection 2023.

Defining imaging sub-phenotypes of psoriatic arthritis: integrative analysis of imaging data and gene expression in a PsA patient cohort.

Rheumatology (Oxford). 2022 Nov 28;61(12):4952-4961. doi: 10.1093/rheumatology/keac078.

Unsupervised Machine Learning to Identify Separable Clinical Alzheimer's Disease Sub-Populations.

Brain Sci. 2021 Jul 23;11(8):977. doi: 10.3390/brainsci11080977.

Hypercluster: a flexible tool for parallelized unsupervised clustering optimization.

BMC Bioinformatics. 2020 Sep 29;21(1):428. doi: 10.1186/s12859-020-03774-1.

Robust subspace methods for outlier detection in genomic data circumvents the curse of dimensionality.

R Soc Open Sci. 2020 Feb 5;7(2):190714. doi: 10.1098/rsos.190714. eCollection 2020 Feb.

The Application of Unsupervised Clustering Methods to Alzheimer's Disease.

Front Comput Neurosci. 2019 May 24;13:31. doi: 10.3389/fncom.2019.00031. eCollection 2019.

Statistical analysis of multi-dimensional, temporal gene expression of stem cells to elucidate colony size-dependent neural differentiation.

Mol Omics. 2018 Apr 16;14(2):109-120. doi: 10.1039/c8mo00011e.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

应用于分子生物学的聚类概述。

An overview of clustering applied to molecular biology.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献