Suppr超能文献

在微阵列基因表达数据中检测不同几何形状的聚类。

Detecting clusters of different geometrical shapes in microarray gene expression data.

作者信息

Kim Dae-Won, Lee Kwang H, Lee Doheon

机构信息

Department of BioSystems and Advanced Information Technology Research Center, Korea Advanced Institute of Science and Technology, Yuseong-gu, Daejeon.

出版信息

Bioinformatics. 2005 May 1;21(9):1927-34. doi: 10.1093/bioinformatics/bti251. Epub 2005 Jan 12.

Abstract

MOTIVATION

Clustering has been used as a popular technique for finding groups of genes that show similar expression patterns under multiple experimental conditions. Many clustering methods have been proposed for clustering gene-expression data, including the hierarchical clustering, k-means clustering and self-organizing map (SOM). However, the conventional methods are limited to identify different shapes of clusters because they use a fixed distance norm when calculating the distance between genes. The fixed distance norm imposes a fixed geometrical shape on the clusters regardless of the actual data distribution. Thus, different distance norms are required for handling the different shapes of clusters.

RESULTS

We present the Gustafson-Kessel (GK) clustering method for microarray gene-expression data. To detect clusters of different shapes in a dataset, we use an adaptive distance norm that is calculated by a fuzzy covariance matrix (F) of each cluster in which the eigenstructure of F is used as an indicator of the shape of the cluster. Moreover, the GK method is less prone to falling into local minima than the k-means and SOM because it makes decisions through the use of membership degrees of a gene to clusters. The algorithmic procedure is accomplished by the alternating optimization technique, which iteratively improves a sequence of sets of clusters until no further improvement is possible. To test the performance of the GK method, we applied the GK method and well-known conventional methods to three recently published yeast datasets, and compared the performance of each method using the Saccharomyces Genome Database annotations. The clustering results of the GK method are more significantly relevant to the biological annotations than those of the other methods, demonstrating its effectiveness and potential for clustering gene-expression data.

AVAILABILITY

The software was developed using Java language, and can be executed on the platforms that JVM (Java Virtual Machine) is running. It is available from the authors upon request.

SUPPLEMENTARY INFORMATION

Supplementary data are available at http://dragon.kaist.ac.kr/gk.

摘要

动机

聚类已成为一种流行技术,用于寻找在多种实验条件下表现出相似表达模式的基因组。已经提出了许多用于对基因表达数据进行聚类的方法,包括层次聚类、k均值聚类和自组织映射(SOM)。然而,传统方法在识别不同形状的聚类方面存在局限性,因为它们在计算基因之间的距离时使用固定的距离范数。固定距离范数会给聚类强加一个固定的几何形状,而不管实际的数据分布如何。因此,处理不同形状的聚类需要不同的距离范数。

结果

我们提出了用于微阵列基因表达数据的古斯塔夫森-凯塞尔(GK)聚类方法。为了检测数据集中不同形状的聚类,我们使用一种自适应距离范数,该范数由每个聚类的模糊协方差矩阵(F)计算得出,其中F的特征结构用作聚类形状的指标。此外,GK方法比k均值和SOM更不容易陷入局部最小值,因为它通过使用基因对聚类的隶属度来做出决策。算法过程通过交替优化技术完成,该技术迭代改进一系列聚类集,直到无法进一步改进为止。为了测试GK方法的性能,我们将GK方法和著名的传统方法应用于最近发布的三个酵母数据集,并使用酿酒酵母基因组数据库注释比较了每种方法的性能。GK方法的聚类结果与生物学注释的相关性比其他方法更显著,证明了其在聚类基因表达数据方面的有效性和潜力。

可用性

该软件使用Java语言开发,可以在运行JVM(Java虚拟机)的平台上执行。可根据作者要求提供。

补充信息

补充数据可在http://dragon.kaist.ac.kr/gk获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验