用于基因分组的分裂相关聚类算法(DCCA):检测表达谱中的变化模式。

Divisive Correlation Clustering Algorithm (DCCA) for grouping of genes: detecting varying patterns in expression profiles.

作者信息

Bhattacharya Anindya, De Rajat K

机构信息

Department of Computer Science and Engineering, Netaji Subhash Engineering College, Garia and Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India.

出版信息

Bioinformatics. 2008 Jun 1;24(11):1359-66. doi: 10.1093/bioinformatics/btn133. Epub 2008 Apr 10.

Abstract

MOTIVATION

Cluster analysis (of gene-expression data) is a useful tool for identifying biologically relevant groups of genes that show similar expression patterns under multiple experimental conditions. Various methods have been proposed for clustering gene-expression data. However most of these algorithms have several shortcomings for gene-expression data clustering. In the present article, we focus on several shortcomings of conventional clustering algorithms and propose a new one that is able to produce better clustering solution than that produced by some others.

RESULTS

We present the Divisive Correlation Clustering Algorithm (DCCA) that is suitable for finding a group of genes having similar pattern of variation in their expression values. To detect clusters with high correlation and biological significance, we use the correlation clustering concept introduced by Bansal et al. Our proposed algorithm DCCA produces a clustering solution without taking number of clusters to be created as an input. DCCA uses the correlation matrix in such a way that all genes in a cluster have highest average correlation with genes in that cluster. To test the performance of the DCCA, we have applied DCCA and some well-known conventional methods to an artificial dataset, and nine gene-expression datasets, and compared the performance of the algorithms. The clustering results of the DCCA are found to be more significantly relevant to the biological annotations than those of the other methods. All these facts show the superiority of the DCCA over some others for the clustering of gene-expression data.

AVAILABILITY

The software has been developed using C and Visual Basic languages, and can be executed on the Microsoft Windows platforms. The software may be downloaded as a zip file from http://www.isical.ac.in/~rajat. Then it needs to be installed. Two word files (included in the zip file) need to be consulted before installation and execution of the software.

摘要

动机

(基因表达数据的)聚类分析是一种有用的工具,可用于识别在多种实验条件下表现出相似表达模式的具有生物学相关性的基因群组。已经提出了各种用于对基因表达数据进行聚类的方法。然而,这些算法中的大多数在基因表达数据聚类方面存在若干缺点。在本文中,我们关注传统聚类算法的几个缺点,并提出一种新算法,该算法能够产生比其他一些算法更好的聚类解决方案。

结果

我们提出了分裂相关聚类算法(DCCA),该算法适用于找到一组在其表达值中具有相似变化模式的基因。为了检测具有高相关性和生物学意义的聚类,我们使用了Bansal等人引入的相关聚类概念。我们提出的算法DCCA在不将要创建的聚类数量作为输入的情况下产生聚类解决方案。DCCA使用相关矩阵的方式使得一个聚类中的所有基因与该聚类中的基因具有最高的平均相关性。为了测试DCCA的性能,我们将DCCA和一些著名的传统方法应用于一个人工数据集和九个基因表达数据集,并比较了这些算法的性能。发现DCCA的聚类结果与生物学注释的相关性比其他方法更显著。所有这些事实表明DCCA在基因表达数据聚类方面优于其他一些方法。

可用性

该软件是使用C和Visual Basic语言开发的,可以在Microsoft Windows平台上执行。该软件可以从http://www.isical.ac.in/~rajat作为zip文件下载。然后需要进行安装。在安装和执行该软件之前,需要参考两个word文件(包含在zip文件中)。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索