Seno Shigeto, Teramoto Reiji, Takenaka Yoichi, Matsuda Hideo
Department of Bioinformatic Engineering, Graduate School of Information Science and Technology, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka 560-8531, Japan.
Genome Inform. 2004;15(2):151-60.
Recently, gene expression data under various conditions have largely been obtained by the utilization of the DNA microarrays and oligonucleotide arrays. There have been emerging demands to analyze the function of genes from the gene expression profiles. For clustering genes from their expression profiles, hierarchical clustering has been widely used. The clustering method represents the relationships of genes as a tree structure by connecting genes using their similarity scores based on the Pearson correlation coefficient. But the clustering method is sensitive to experimental noise. To cope with the problem, we propose another type of clustering method (the p-quasi complete linkage clustering). We apply this method to the gene expression data of yeast cell-cycles and human lung cancer. The effectiveness of our method is demonstrated by comparing clustering results with other methods.
最近,通过使用DNA微阵列和寡核苷酸阵列,已大量获得了各种条件下的基因表达数据。从基因表达谱分析基因功能的需求不断涌现。为了根据基因表达谱对基因进行聚类,层次聚类已被广泛使用。该聚类方法通过基于皮尔逊相关系数使用基因的相似性得分将基因连接起来,将基因之间的关系表示为树形结构。但是该聚类方法对实验噪声敏感。为了解决这个问题,我们提出了另一种聚类方法(p-准完全连锁聚类)。我们将此方法应用于酵母细胞周期和人类肺癌的基因表达数据。通过将聚类结果与其他方法进行比较,证明了我们方法的有效性。