一项基于真实数据的基因表达数据聚类比较研究。

A ground truth based comparative study on clustering of gene expression data.

作者信息

Zhu Yitan, Wang Zuyi, Miller David J, Clarke Robert, Xuan Jianhua, Hoffman Eric P, Wang Yue

机构信息

Department of Electrical and Computer Engineering, Virginia Polytechnic and State University, Arlington, VA 22203, USA.

出版信息

Front Biosci. 2008 May 1;13:3839-49. doi: 10.2741/2972.

DOI:10.2741/2972

PMID:18508478

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4737472/

Abstract

Given the variety of available clustering methods for gene expression data analysis, it is important to develop an appropriate and rigorous validation scheme to assess the performance and limitations of the most widely used clustering algorithms. In this paper, we present a ground truth based comparative study on the functionality, accuracy, and stability of five data clustering methods, namely hierarchical clustering, K-means clustering, self-organizing maps, standard finite normal mixture fitting, and a caBIG toolkit (VIsual Statistical Data Analyzer--VISDA), tested on sample clustering of seven published microarray gene expression datasets and one synthetic dataset. We examined the performance of these algorithms in both data-sufficient and data-insufficient cases using quantitative performance measures, including cluster number detection accuracy and mean and standard deviation of partition accuracy. The experimental results showed that VISDA, an interactive coarse-to-fine maximum likelihood fitting algorithm, is a solid performer on most of the datasets, while K-means clustering and self-organizing maps optimized by the mean squared compactness criterion generally produce more stable solutions than the other methods.

摘要

鉴于基因表达数据分析有多种可用的聚类方法，开发一种合适且严谨的验证方案来评估最广泛使用的聚类算法的性能和局限性很重要。在本文中，我们对五种数据聚类方法（即层次聚类、K均值聚类、自组织映射、标准有限正态混合拟合和一个caBIG工具包（可视化统计数据分析器——VISDA））的功能、准确性和稳定性进行了基于真实情况的比较研究，这些方法在七个已发表的微阵列基因表达数据集和一个合成数据集的样本聚类上进行了测试。我们使用定量性能指标，包括聚类数量检测准确性以及划分准确性的均值和标准差，在数据充足和数据不足的情况下检验了这些算法的性能。实验结果表明，VISDA（一种交互式的从粗到细的最大似然拟合算法）在大多数数据集上表现出色，而通过均方紧致性准则优化的K均值聚类和自组织映射通常比其他方法产生更稳定的解决方案。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

一项基于真实数据的基因表达数据聚类比较研究。

A ground truth based comparative study on clustering of gene expression data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

一项基于真实数据的基因表达数据聚类比较研究。

A ground truth based comparative study on clustering of gene expression data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献