Liu Ying, Navathe Shamkant B, Civera Jorge, Dasigi Venu, Ram Ashwin, Ciliax Brian J, Dingledine Ray
College of Computing, Georgia Institute of Technology, 801 Atlantic Drive, Atlanta, GA 30322, USA.
IEEE/ACM Trans Comput Biol Bioinform. 2005 Jan-Mar;2(1):62-76. doi: 10.1109/TCBB.2005.14.
Partitioning closely related genes into clusters has become an important element of practically all statistical analyses of microarray data. A number of computer algorithms have been developed for this task. Although these algorithms have demonstrated their usefulness for gene clustering, some basic problems remain. This paper describes our work on extracting functional keywords from MEDLINE for a set of genes that are isolated for further study from microarray experiments based on their differential expression patterns. The sharing of functional keywords among genes is used as a basis for clustering in a new approach called BEA-PARTITION in this paper. Functional keywords associated with genes were extracted from MEDLINE abstracts. We modified the Bond Energy Algorithm (BEA), which is widely accepted in psychology and database design but is virtually unknown in bioinformatics, to cluster genes by functional keyword associations. The results showed that BEA-PARTITION and hierarchical clustering algorithm outperformed k-means clustering and self-organizing map by correctly assigning 25 of 26 genes in a test set of four known gene groups. To evaluate the effectiveness of BEA-PARTITION for clustering genes identified by microarray profiles, 44 yeast genes that are differentially expressed during the cell cycle and have been widely studied in the literature were used as a second test set. Using established measures of cluster quality, the results produced by BEA-PARTITION had higher purity, lower entropy, and higher mutual information than those produced by k-means and self-organizing map. Whereas BEA-PARTITION and the hierarchical clustering produced similar quality of clusters, BEA-PARTITION provides clear cluster boundaries compared to the hierarchical clustering. BEA-PARTITION is simple to implement and provides a powerful approach to clustering genes or to any clustering problem where starting matrices are available from experimental observations.
将密切相关的基因划分为不同的簇已成为几乎所有微阵列数据统计分析的重要组成部分。针对此任务已开发出许多计算机算法。尽管这些算法已证明其在基因聚类方面的有用性,但一些基本问题仍然存在。本文描述了我们的工作,即从MEDLINE中提取一组基因的功能关键词,这些基因是基于其差异表达模式从微阵列实验中分离出来以供进一步研究的。基因之间功能关键词的共享被用作一种新方法(本文称为BEA-PARTITION)中聚类的基础。从MEDLINE摘要中提取与基因相关的功能关键词。我们修改了在心理学和数据库设计中被广泛接受但在生物信息学中几乎无人知晓的键能算法(BEA),以通过功能关键词关联对基因进行聚类。结果表明,在一个包含四个已知基因组的测试集中,BEA-PARTITION和层次聚类算法通过正确分配26个基因中的25个,优于k均值聚类和自组织映射。为了评估BEA-PARTITION对通过微阵列图谱鉴定的基因进行聚类的有效性,44个在细胞周期中差异表达且在文献中已被广泛研究的酵母基因被用作第二个测试集。使用既定的聚类质量度量方法,BEA-PARTITION产生的结果比k均值聚类和自组织映射产生的结果具有更高的纯度、更低的熵和更高的互信息。虽然BEA-PARTITION和层次聚类产生的聚类质量相似,但与层次聚类相比,BEA-PARTITION提供了清晰的聚类边界。BEA-PARTITION易于实现,为基因聚类或任何可以从实验观察中获得起始矩阵的聚类问题提供了一种强大的方法。