Tasoulis D K, Plagianakos V P, Vrahatis M N
Computational Intelligence Laboratory (CILAB), Department of Mathematics, University of Patras, GR-26110 Patras, Greece.
Comput Biol Med. 2006 Oct;36(10):1126-42. doi: 10.1016/j.compbiomed.2005.09.003. Epub 2005 Oct 24.
The development of microarray technologies gives scientists the ability to examine, discover and monitor the mRNA transcript levels of thousands of genes in a single experiment. Nonetheless, the tremendous amount of data that can be obtained from microarray studies presents a challenge for data analysis. The most commonly used computational approach for analyzing microarray data is cluster analysis, since the number of genes is usually very high compared to the number of samples. In this paper, we investigate the application of the recently proposed k-windows clustering algorithm on gene expression microarray data. This algorithm apart from identifying the clusters present in a data set also calculates their number and thus requires no special knowledge about the data. To improve the quality of the clustering, we employ various dimension reduction techniques and propose a hybrid one. The results obtained by the application of the algorithm exhibit high classification success.
微阵列技术的发展使科学家能够在一次实验中检测、发现和监测数千个基因的mRNA转录水平。然而,从微阵列研究中可获得的大量数据给数据分析带来了挑战。分析微阵列数据最常用的计算方法是聚类分析,因为与样本数量相比,基因数量通常非常多。在本文中,我们研究了最近提出的k窗口聚类算法在基因表达微阵列数据上的应用。该算法除了能识别数据集中存在的聚类外,还能计算聚类的数量,因此不需要关于数据的特殊知识。为了提高聚类质量,我们采用了各种降维技术并提出了一种混合技术。应用该算法获得的结果显示出很高的分类成功率。