一种用于聚类基因表达数据的改进算法。

An improved algorithm for clustering gene expression data.

作者信息

Bandyopadhyay Sanghamitra, Mukhopadhyay Anirban, Maulik Ujjwal

机构信息

Machine Intelligence Unit, Indian Statistical Institute, Kolkata-700108, India.

出版信息

Bioinformatics. 2007 Nov 1;23(21):2859-65. doi: 10.1093/bioinformatics/btm418. Epub 2007 Aug 25.

DOI:10.1093/bioinformatics/btm418

PMID:17720981

Abstract

MOTIVATION

Recent advancements in microarray technology allows simultaneous monitoring of the expression levels of a large number of genes over different time points. Clustering is an important tool for analyzing such microarray data, typical properties of which are its inherent uncertainty, noise and imprecision. In this article, a two-stage clustering algorithm, which employs a recently proposed variable string length genetic scheme and a multiobjective genetic clustering algorithm, is proposed. It is based on the novel concept of points having significant membership to multiple classes. An iterated version of the well-known Fuzzy C-Means is also utilized for clustering.

RESULTS

The significant superiority of the proposed two-stage clustering algorithm as compared to the average linkage method, Self Organizing Map (SOM) and a recently developed weighted Chinese restaurant-based clustering method (CRC), widely used methods for clustering gene expression data, is established on a variety of artificial and publicly available real life data sets. The biological relevance of the clustering solutions are also analyzed.

摘要

动机

微阵列技术的最新进展允许在不同时间点同时监测大量基因的表达水平。聚类是分析此类微阵列数据的重要工具，其典型特性包括固有的不确定性、噪声和不精确性。本文提出了一种两阶段聚类算法，该算法采用了最近提出的可变字符串长度遗传方案和多目标遗传聚类算法。它基于点对多个类具有显著隶属度的新概念。著名的模糊C均值的迭代版本也用于聚类。

结果

在各种人工和公开可用的真实数据集上，与平均连锁法、自组织映射（SOM）和最近开发的基于加权中餐厅的聚类方法（CRC）（广泛用于聚类基因表达数据的方法）相比，所提出的两阶段聚类算法具有显著优势。还分析了聚类解决方案的生物学相关性。

相似文献

An improved algorithm for clustering gene expression data.

Bioinformatics. 2007 Nov 1;23(21):2859-65. doi: 10.1093/bioinformatics/btm418. Epub 2007 Aug 25.

Graph-based consensus clustering for class discovery from gene expression data.

Bioinformatics. 2007 Nov 1;23(21):2888-96. doi: 10.1093/bioinformatics/btm463. Epub 2007 Sep 14.

Clustering of gene expression data: performance and similarity analysis.

BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S19. doi: 10.1186/1471-2105-7-S4-S19.

Clustering of change patterns using Fourier coefficients.

Bioinformatics. 2008 Jan 15;24(2):184-91. doi: 10.1093/bioinformatics/btm568. Epub 2007 Nov 19.

Divisive Correlation Clustering Algorithm (DCCA) for grouping of genes: detecting varying patterns in expression profiles.

Bioinformatics. 2008 Jun 1;24(11):1359-66. doi: 10.1093/bioinformatics/btn133. Epub 2008 Apr 10.

A mixture model with random-effects components for clustering correlated gene-expression profiles.

Bioinformatics. 2006 Jul 15;22(14):1745-52. doi: 10.1093/bioinformatics/btl165. Epub 2006 May 3.

TimeClust: a clustering tool for gene expression time series.

Bioinformatics. 2008 Feb 1;24(3):430-2. doi: 10.1093/bioinformatics/btm605. Epub 2007 Dec 6.

Detecting clusters of different geometrical shapes in microarray gene expression data.

Bioinformatics. 2005 May 1;21(9):1927-34. doi: 10.1093/bioinformatics/bti251. Epub 2005 Jan 12.

Evaluation of clustering algorithms for gene expression data.

BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S17. doi: 10.1186/1471-2105-7-S4-S17.

Hierarchical tree snipping: clustering guided by prior knowledge.

Bioinformatics. 2007 Dec 15;23(24):3335-42. doi: 10.1093/bioinformatics/btm526. Epub 2007 Nov 7.

引用本文的文献

MSC-CSMC: A multi-objective semi-supervised clustering algorithm based on constraints selection and multi-source constraints for gene expression data.

Front Genet. 2023 Feb 27;14:1135260. doi: 10.3389/fgene.2023.1135260. eCollection 2023.

A hybrid multi-objective whale optimization algorithm for analyzing microarray data based on Apache Spark.

PeerJ Comput Sci. 2021 Mar 25;7:e416. doi: 10.7717/peerj-cs.416. eCollection 2021.

Unsupervised Learning and Multipartite Network Models: A Promising Approach for Understanding Traditional Medicine.

Front Pharmacol. 2020 Aug 26;11:1319. doi: 10.3389/fphar.2020.01319. eCollection 2020.

Overlapping clustering of gene expression data using penalized weighted normalized cut.

Genet Epidemiol. 2018 Dec;42(8):796-811. doi: 10.1002/gepi.22164. Epub 2018 Oct 9.

A multi-objective gene clustering algorithm guided by apriori biological knowledge with intensification and diversification strategies.

BioData Min. 2018 Aug 7;11:16. doi: 10.1186/s13040-018-0178-4. eCollection 2018.

Silhouette Scores for Arbitrary Defined Groups in Gene Expression Data and Insights into Differential Expression Results.

Biol Proced Online. 2018 Mar 1;20:5. doi: 10.1186/s12575-018-0067-8. eCollection 2018.

Clustering Algorithms: Their Application to Gene Expression Data.

Bioinform Biol Insights. 2016 Nov 30;10:237-253. doi: 10.4137/BBI.S38316. eCollection 2016.

Identifying Cancer Biomarkers From Microarray Data Using Feature Selection and Semisupervised Learning.

IEEE J Transl Eng Health Med. 2014 Dec 2;2:4300211. doi: 10.1109/JTEHM.2014.2375820. eCollection 2014.

GESearch: An Interactive GUI Tool for Identifying Gene Expression Signature.

Biomed Res Int. 2015;2015:853734. doi: 10.1155/2015/853734. Epub 2015 Jun 25.

Interpolation based consensus clustering for gene expression time series.

BMC Bioinformatics. 2015 Apr 16;16:117. doi: 10.1186/s12859-015-0541-0.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于聚类基因表达数据的改进算法。

An improved algorithm for clustering gene expression data.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

动机

结果

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献