Yang Zheng Rong
School of Engineering and Computer Science, Exeter University, Exeter EX4 4QF, UK.
Bioinformatics. 2004 Nov 1;20(16):2759-66. doi: 10.1093/bioinformatics/bth323. Epub 2004 May 27.
It is understood that clustering genes are useful for exploring scientific knowledge from DNA microarray gene expression data. The explored knowledge can be finally used for annotating biological function for novel genes. Representing the explored knowledge in an efficient manner is then closely related to the classification accuracy. However, this issue has not yet been paid the attention it deserves.
A novel method based on template theory in cognitive psychology and pattern recognition is developed in this study for representing knowledge extracted from cluster analysis effectively. The basic principle is to represent knowledge according to the relationship between genes and a found cluster structure. Based on this novel knowledge representation method, a pattern recognition algorithm (the decision tree algorithm C4.5) is then used to construct a classifier for annotating biological functions of novel genes. The experiments on five published datasets show that this method has improved the classification performance compared with the conventional method. The statistical tests indicate that this improvement is significant.
The software package can be obtained upon request from the author.
众所周知,对基因进行聚类有助于从DNA微阵列基因表达数据中探索科学知识。所探索的知识最终可用于注释新基因的生物学功能。以高效的方式表示所探索的知识与分类准确性密切相关。然而,这个问题尚未得到应有的关注。
本研究基于认知心理学和模式识别中的模板理论,开发了一种新方法,用于有效表示从聚类分析中提取的知识。其基本原理是根据基因与发现的聚类结构之间的关系来表示知识。基于这种新的知识表示方法,然后使用一种模式识别算法(决策树算法C4.5)来构建一个用于注释新基因生物学功能的分类器。在五个已发表的数据集上进行的实验表明,与传统方法相比,该方法提高了分类性能。统计检验表明这种改进是显著的。
可向作者索取软件包。