Suppr超能文献

基于熵的基因表达数据聚类验证及聚类数量估计

Entropy-based cluster validation and estimation of the number of clusters in gene expression data.

作者信息

Novoselova Natalia, Tom Igor

机构信息

Department of Bioinformatics, United Institute of Informatics Problems, Surganova Street 6, Minsk 220012, Belarus.

出版信息

J Bioinform Comput Biol. 2012 Oct;10(5):1250011. doi: 10.1142/S0219720012500114. Epub 2012 Jun 26.

Abstract

Many external and internal validity measures have been proposed in order to estimate the number of clusters in gene expression data but as a rule they do not consider the analysis of the stability of the groupings produced by a clustering algorithm. Based on the approach assessing the predictive power or stability of a partitioning, we propose the new measure of cluster validation and the selection procedure to determine the suitable number of clusters. The validity measure is based on the estimation of the "clearness" of the consensus matrix, which is the result of a resampling clustering scheme or consensus clustering. According to the proposed selection procedure the stable clustering result is determined with the reference to the validity measure for the null hypothesis encoding for the absence of clusters. The final number of clusters is selected by analyzing the distance between the validity plots for initial and permutated data sets. We applied the selection procedure to estimate the clustering results on several datasets. As a result the proposed procedure produced an accurate and robust estimate of the number of clusters, which are in agreement with the biological knowledge and gold standards of cluster quality.

摘要

为了估计基因表达数据中的聚类数量,人们提出了许多外部和内部有效性度量方法,但通常它们没有考虑对聚类算法产生的分组稳定性进行分析。基于评估划分的预测能力或稳定性的方法,我们提出了新的聚类验证度量和选择程序,以确定合适的聚类数量。该有效性度量基于对共识矩阵“清晰度”的估计,共识矩阵是重采样聚类方案或共识聚类的结果。根据提出的选择程序,参考针对无聚类的零假设编码的有效性度量来确定稳定的聚类结果。通过分析初始数据集和置换数据集的有效性图之间的距离来选择最终的聚类数量。我们将该选择程序应用于几个数据集以估计聚类结果。结果表明,所提出的程序对聚类数量进行了准确且稳健的估计,这与生物学知识和聚类质量的黄金标准一致。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验