基于基因本体论注释评估基因表达数据的聚类算法。

Evaluation of clustering algorithms for gene expression data using gene ontology annotations.

机构信息

Department of Biomedical Engineering, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, School of Basic Medicine, Peking Union Medical College, Beijing 100005, China.

出版信息

Chin Med J (Engl). 2012 Sep;125(17):3048-52.

PMID:22932178

Abstract

BACKGROUND

Clustering is a useful exploratory technique for interpreting gene expression data to reveal groups of genes sharing common functional attributes. Biologists frequently face the problem of choosing an appropriate algorithm. We aimed to provide a standalone, easily accessible and biologically oriented criterion for expression data clustering evaluation.

METHODS

An external criterion utilizing annotation based similarities between genes is proposed in this work. Gene ontology information is employed as the annotation source. Comparisons among six widely used clustering algorithms over various types of gene expression data sets were carried out based on the criterion proposed.

RESULTS

The rank of these algorithms given by the criterion coincides with our common knowledge. Single-linkage has significantly poorer performance, even worse than the random algorithm. Ward's method archives the best performance in most cases.

CONCLUSIONS

The criterion proposed has a strong ability to distinguish among different clustering algorithms with different distance measurements. It is also demonstrated that analyzing main contributors of the criterion may offer some guidelines in finding local compact clusters. As an addition, we suggest using Ward's algorithm for gene expression data analysis.

摘要

背景

聚类是一种有用的探索性技术，用于解释基因表达数据，以揭示具有共同功能属性的基因群。生物学家经常面临选择合适算法的问题。我们旨在为表达数据聚类评估提供一个独立的、易于访问的和具有生物学导向的标准。

方法

本工作提出了一种利用基因间基于注释相似性的外部标准。基因本体信息被用作注释来源。根据所提出的标准，对六种广泛使用的聚类算法在各种类型的基因表达数据集上的性能进行了比较。

结果

该标准给出的这些算法的排名与我们的常识相符。单链接算法的性能明显较差，甚至比随机算法还要差。沃德方法在大多数情况下表现最好。

结论

所提出的标准具有区分不同距离度量的聚类算法的强大能力。此外，我们还表明，分析标准的主要贡献者可能为找到局部紧凑聚类提供一些指导。作为补充，我们建议在基因表达数据分析中使用 Ward 算法。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于基因本体论注释评估基因表达数据的聚类算法。

Evaluation of clustering algorithms for gene expression data using gene ontology annotations.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

基于基因本体论注释评估基因表达数据的聚类算法。

Evaluation of clustering algorithms for gene expression data using gene ontology annotations.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献