Suppr超能文献

基于熵的基因表达数据聚类验证及聚类数量估计

Entropy-based cluster validation and estimation of the number of clusters in gene expression data.

作者信息

Novoselova Natalia, Tom Igor

机构信息

Department of Bioinformatics, United Institute of Informatics Problems, Surganova Street 6, Minsk 220012, Belarus.

出版信息

J Bioinform Comput Biol. 2012 Oct;10(5):1250011. doi: 10.1142/S0219720012500114. Epub 2012 Jun 26.

Abstract

Many external and internal validity measures have been proposed in order to estimate the number of clusters in gene expression data but as a rule they do not consider the analysis of the stability of the groupings produced by a clustering algorithm. Based on the approach assessing the predictive power or stability of a partitioning, we propose the new measure of cluster validation and the selection procedure to determine the suitable number of clusters. The validity measure is based on the estimation of the "clearness" of the consensus matrix, which is the result of a resampling clustering scheme or consensus clustering. According to the proposed selection procedure the stable clustering result is determined with the reference to the validity measure for the null hypothesis encoding for the absence of clusters. The final number of clusters is selected by analyzing the distance between the validity plots for initial and permutated data sets. We applied the selection procedure to estimate the clustering results on several datasets. As a result the proposed procedure produced an accurate and robust estimate of the number of clusters, which are in agreement with the biological knowledge and gold standards of cluster quality.

摘要

为了估计基因表达数据中的聚类数量,人们提出了许多外部和内部有效性度量方法,但通常它们没有考虑对聚类算法产生的分组稳定性进行分析。基于评估划分的预测能力或稳定性的方法,我们提出了新的聚类验证度量和选择程序,以确定合适的聚类数量。该有效性度量基于对共识矩阵“清晰度”的估计,共识矩阵是重采样聚类方案或共识聚类的结果。根据提出的选择程序,参考针对无聚类的零假设编码的有效性度量来确定稳定的聚类结果。通过分析初始数据集和置换数据集的有效性图之间的距离来选择最终的聚类数量。我们将该选择程序应用于几个数据集以估计聚类结果。结果表明,所提出的程序对聚类数量进行了准确且稳健的估计,这与生物学知识和聚类质量的黄金标准一致。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验