Chatterjee Soumyadeep, Bhattacharjee Kasturi, Konar Amit
Artificial Intelligence Laboratory, Jadavpur University, Kolkata, India.
Biotechnol J. 2009 Sep;4(9):1357-61. doi: 10.1002/biot.200800219.
With the advent of the microarray technology, the field of life science has been greatly revolutionized, since this technique allows the simultaneous monitoring of the expression levels of thousands of genes in a particular organism. However, the statistical analysis of expression data has its own challenges, primarily because of the huge amount of data that is to be dealt with, and also because of the presence of noise, which is almost an inherent characteristic of microarray data. Clustering is one tool used to mine meaningful patterns from microarray data. In this paper, we present a novel method of clustering yeast microarray data, which is robust and yet simple to implement. It identifies the best clusters from a given dataset on the basis of the population of the clusters as well as the variance of the feature values of the members from the cluster-center. It has been found to yield satisfactory results even in the presence of noisy data.
随着微阵列技术的出现,生命科学领域发生了巨大变革,因为该技术能够同时监测特定生物体中数千个基因的表达水平。然而,表达数据的统计分析存在自身的挑战,主要是因为要处理的数据量巨大,还因为噪声的存在,而噪声几乎是微阵列数据的固有特征。聚类是用于从微阵列数据中挖掘有意义模式的一种工具。在本文中,我们提出了一种用于对酵母微阵列数据进行聚类的新方法,该方法稳健且易于实现。它基于聚类的数量以及聚类中心成员特征值的方差,从给定数据集中识别出最佳聚类。即使存在噪声数据,该方法也已被发现能产生令人满意的结果。