Suppr
超能文献

在微阵列基因表达数据中检测不同几何形状的聚类。

Detecting clusters of different geometrical shapes in microarray gene expression data.

作者信息

Kim Dae-Won, Lee Kwang H, Lee Doheon

机构信息

Department of BioSystems and Advanced Information Technology Research Center, Korea Advanced Institute of Science and Technology, Yuseong-gu, Daejeon.

出版信息

Bioinformatics. 2005 May 1;21(9):1927-34. doi: 10.1093/bioinformatics/bti251. Epub 2005 Jan 12.

DOI:10.1093/bioinformatics/bti251

PMID:15647300

Abstract

MOTIVATION

Clustering has been used as a popular technique for finding groups of genes that show similar expression patterns under multiple experimental conditions. Many clustering methods have been proposed for clustering gene-expression data, including the hierarchical clustering, k-means clustering and self-organizing map (SOM). However, the conventional methods are limited to identify different shapes of clusters because they use a fixed distance norm when calculating the distance between genes. The fixed distance norm imposes a fixed geometrical shape on the clusters regardless of the actual data distribution. Thus, different distance norms are required for handling the different shapes of clusters.

RESULTS

We present the Gustafson-Kessel (GK) clustering method for microarray gene-expression data. To detect clusters of different shapes in a dataset, we use an adaptive distance norm that is calculated by a fuzzy covariance matrix (F) of each cluster in which the eigenstructure of F is used as an indicator of the shape of the cluster. Moreover, the GK method is less prone to falling into local minima than the k-means and SOM because it makes decisions through the use of membership degrees of a gene to clusters. The algorithmic procedure is accomplished by the alternating optimization technique, which iteratively improves a sequence of sets of clusters until no further improvement is possible. To test the performance of the GK method, we applied the GK method and well-known conventional methods to three recently published yeast datasets, and compared the performance of each method using the Saccharomyces Genome Database annotations. The clustering results of the GK method are more significantly relevant to the biological annotations than those of the other methods, demonstrating its effectiveness and potential for clustering gene-expression data.

AVAILABILITY

The software was developed using Java language, and can be executed on the platforms that JVM (Java Virtual Machine) is running. It is available from the authors upon request.

SUPPLEMENTARY INFORMATION

Supplementary data are available at http://dragon.kaist.ac.kr/gk.

摘要

动机

聚类已成为一种流行技术，用于寻找在多种实验条件下表现出相似表达模式的基因组。已经提出了许多用于对基因表达数据进行聚类的方法，包括层次聚类、k均值聚类和自组织映射（SOM）。然而，传统方法在识别不同形状的聚类方面存在局限性，因为它们在计算基因之间的距离时使用固定的距离范数。固定距离范数会给聚类强加一个固定的几何形状，而不管实际的数据分布如何。因此，处理不同形状的聚类需要不同的距离范数。

结果

我们提出了用于微阵列基因表达数据的古斯塔夫森-凯塞尔（GK）聚类方法。为了检测数据集中不同形状的聚类，我们使用一种自适应距离范数，该范数由每个聚类的模糊协方差矩阵（F）计算得出，其中F的特征结构用作聚类形状的指标。此外，GK方法比k均值和SOM更不容易陷入局部最小值，因为它通过使用基因对聚类的隶属度来做出决策。算法过程通过交替优化技术完成，该技术迭代改进一系列聚类集，直到无法进一步改进为止。为了测试GK方法的性能，我们将GK方法和著名的传统方法应用于最近发布的三个酵母数据集，并使用酿酒酵母基因组数据库注释比较了每种方法的性能。GK方法的聚类结果与生物学注释的相关性比其他方法更显著，证明了其在聚类基因表达数据方面的有效性和潜力。

可用性

该软件使用Java语言开发，可以在运行JVM（Java虚拟机）的平台上执行。可根据作者要求提供。

补充信息

补充数据可在http://dragon.kaist.ac.kr/gk获得。

相似文献

Detecting clusters of different geometrical shapes in microarray gene expression data.

Bioinformatics. 2005 May 1;21(9):1927-34. doi: 10.1093/bioinformatics/bti251. Epub 2005 Jan 12.

Towards clustering of incomplete microarray data without the use of imputation.

Bioinformatics. 2007 Jan 1;23(1):107-13. doi: 10.1093/bioinformatics/btl555. Epub 2006 Oct 31.

Divisive Correlation Clustering Algorithm (DCCA) for grouping of genes: detecting varying patterns in expression profiles.

Bioinformatics. 2008 Jun 1;24(11):1359-66. doi: 10.1093/bioinformatics/btn133. Epub 2008 Apr 10.

Analysis of a Gibbs sampler method for model-based clustering of gene expression data.

Bioinformatics. 2008 Jan 15;24(2):176-83. doi: 10.1093/bioinformatics/btm562. Epub 2007 Nov 22.

Clustering of change patterns using Fourier coefficients.

Bioinformatics. 2008 Jan 15;24(2):184-91. doi: 10.1093/bioinformatics/btm568. Epub 2007 Nov 19.

Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach.

Bioinformatics. 2007 Jul 1;23(13):1607-15. doi: 10.1093/bioinformatics/btm158. Epub 2007 May 5.

Microarray data clustering based on temporal variation: FCV with TSD preclustering.

Appl Bioinformatics. 2003;2(1):35-45.

A new algorithm for comparing and visualizing relationships between hierarchical and flat gene expression data clusterings.

Bioinformatics. 2005 Nov 1;21(21):3993-9. doi: 10.1093/bioinformatics/bti644. Epub 2005 Sep 1.

A multi-stage approach to clustering and imputation of gene expression profiles.

Bioinformatics. 2007 Apr 15;23(8):998-1005. doi: 10.1093/bioinformatics/btm053. Epub 2007 Feb 18.

Modified fuzzy gap statistic for estimating preferable number of clusters in fuzzy k-means clustering.

J Biosci Bioeng. 2008 Mar;105(3):273-81. doi: 10.1263/jbb.105.273.

引用本文的文献

Feature selection of gene expression data for Cancer classification using double RBF-kernels.

BMC Bioinformatics. 2018 Oct 29;19(1):396. doi: 10.1186/s12859-018-2400-2.

Systems analysis of high-throughput data.

Adv Exp Med Biol. 2014;844:153-87. doi: 10.1007/978-1-4939-2095-2_8.

iPcc: a novel feature extraction method for accurate disease class discovery and prediction.

Nucleic Acids Res. 2013 Aug;41(14):e143. doi: 10.1093/nar/gkt343. Epub 2013 Jun 12.

Partition decoupling for multi-gene analysis of gene expression profiling data.

BMC Bioinformatics. 2011 Dec 30;12:497. doi: 10.1186/1471-2105-12-497.

SpaCEM3: a software for biological module detection when data is incomplete, high dimensional and dependent.

Bioinformatics. 2011 Mar 15;27(6):881-2. doi: 10.1093/bioinformatics/btr034. Epub 2011 Feb 3.

MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering.

BMC Bioinformatics. 2009 Aug 22;10:260. doi: 10.1186/1471-2105-10-260.

Identification of temporal association rules from time-series microarray data sets.

BMC Bioinformatics. 2009 Mar 19;10 Suppl 3(Suppl 3):S6. doi: 10.1186/1471-2105-10-S3-S6.

Systematic gene function prediction from gene expression data by using a fuzzy nearest-cluster method.

BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S23. doi: 10.1186/1471-2105-7-S4-S23.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

在微阵列基因表达数据中检测不同几何形状的聚类。

Detecting clusters of different geometrical shapes in microarray gene expression data.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

SUPPLEMENTARY INFORMATION

动机

结果

可用性

补充信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译