基于熵的基因表达数据聚类验证及聚类数量估计

Entropy-based cluster validation and estimation of the number of clusters in gene expression data.

作者信息

Novoselova Natalia, Tom Igor

机构信息

Department of Bioinformatics, United Institute of Informatics Problems, Surganova Street 6, Minsk 220012, Belarus.

出版信息

J Bioinform Comput Biol. 2012 Oct;10(5):1250011. doi: 10.1142/S0219720012500114. Epub 2012 Jun 26.

DOI:10.1142/S0219720012500114

PMID:22849366

Abstract

Many external and internal validity measures have been proposed in order to estimate the number of clusters in gene expression data but as a rule they do not consider the analysis of the stability of the groupings produced by a clustering algorithm. Based on the approach assessing the predictive power or stability of a partitioning, we propose the new measure of cluster validation and the selection procedure to determine the suitable number of clusters. The validity measure is based on the estimation of the "clearness" of the consensus matrix, which is the result of a resampling clustering scheme or consensus clustering. According to the proposed selection procedure the stable clustering result is determined with the reference to the validity measure for the null hypothesis encoding for the absence of clusters. The final number of clusters is selected by analyzing the distance between the validity plots for initial and permutated data sets. We applied the selection procedure to estimate the clustering results on several datasets. As a result the proposed procedure produced an accurate and robust estimate of the number of clusters, which are in agreement with the biological knowledge and gold standards of cluster quality.

摘要

为了估计基因表达数据中的聚类数量，人们提出了许多外部和内部有效性度量方法，但通常它们没有考虑对聚类算法产生的分组稳定性进行分析。基于评估划分的预测能力或稳定性的方法，我们提出了新的聚类验证度量和选择程序，以确定合适的聚类数量。该有效性度量基于对共识矩阵“清晰度”的估计，共识矩阵是重采样聚类方案或共识聚类的结果。根据提出的选择程序，参考针对无聚类的零假设编码的有效性度量来确定稳定的聚类结果。通过分析初始数据集和置换数据集的有效性图之间的距离来选择最终的聚类数量。我们将该选择程序应用于几个数据集以估计聚类结果。结果表明，所提出的程序对聚类数量进行了准确且稳健的估计，这与生物学知识和聚类质量的黄金标准一致。

相似文献

Entropy-based cluster validation and estimation of the number of clusters in gene expression data.基于熵的基因表达数据聚类验证及聚类数量估计

J Bioinform Comput Biol. 2012 Oct;10(5):1250011. doi: 10.1142/S0219720012500114. Epub 2012 Jun 26.

Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach.聚类验证指标的加权排序聚合：一种蒙特卡洛交叉熵方法。

Bioinformatics. 2007 Jul 1;23(13):1607-15. doi: 10.1093/bioinformatics/btm158. Epub 2007 May 5.

Stability-based validation of clustering solutions.基于稳定性的聚类解决方案验证。

Neural Comput. 2004 Jun;16(6):1299-323. doi: 10.1162/089976604773717621.

Knowledge-assisted recognition of cluster boundaries in gene expression data.基因表达数据中聚类边界的知识辅助识别。

Artif Intell Med. 2005 Sep-Oct;35(1-2):171-83. doi: 10.1016/j.artmed.2005.02.007.

Randomized maps for assessing the reliability of patients clusters in DNA microarray data analyses.用于评估DNA微阵列数据分析中患者聚类可靠性的随机图谱。

Artif Intell Med. 2006 Jun;37(2):85-109. doi: 10.1016/j.artmed.2006.03.005. Epub 2006 May 23.

Importance of proximity measures in clustering of cancer and miRNA datasets: proposal of an automated framework.癌症和miRNA数据集聚类中邻近性度量的重要性：一种自动化框架的提议

Mol Biosyst. 2016 Oct 18;12(11):3478-3501. doi: 10.1039/c6mb00609d.

Modified fuzzy gap statistic for estimating preferable number of clusters in fuzzy k-means clustering.用于估计模糊k均值聚类中最优聚类数的改进模糊间隙统计量

J Biosci Bioeng. 2008 Mar;105(3):273-81. doi: 10.1263/jbb.105.273.

Application of Multi-SOM clustering approach to macrophage gene expression analysis.多自组织映射聚类方法在巨噬细胞基因表达分析中的应用。

Infect Genet Evol. 2009 May;9(3):328-36. doi: 10.1016/j.meegid.2008.09.009. Epub 2008 Oct 17.

Graph-based consensus clustering for class discovery from gene expression data.基于图的共识聚类用于从基因表达数据中发现类别

Bioinformatics. 2007 Nov 1;23(21):2888-96. doi: 10.1093/bioinformatics/btm463. Epub 2007 Sep 14.

A New Validity Index Based on Fuzzy Energy and Fuzzy Entropy Measures in Fuzzy Clustering Problems.基于模糊聚类问题中模糊能量和模糊熵测度的一种新有效性指标。

Entropy (Basel). 2020 Oct 23;22(11):1200. doi: 10.3390/e22111200.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于熵的基因表达数据聚类验证及聚类数量估计

Entropy-based cluster validation and estimation of the number of clusters in gene expression data.

作者信息

机构信息

出版信息

相似文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献