• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一项基于真实数据的基因表达数据聚类比较研究。

A ground truth based comparative study on clustering of gene expression data.

作者信息

Zhu Yitan, Wang Zuyi, Miller David J, Clarke Robert, Xuan Jianhua, Hoffman Eric P, Wang Yue

机构信息

Department of Electrical and Computer Engineering, Virginia Polytechnic and State University, Arlington, VA 22203, USA.

出版信息

Front Biosci. 2008 May 1;13:3839-49. doi: 10.2741/2972.

DOI:10.2741/2972
PMID:18508478
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4737472/
Abstract

Given the variety of available clustering methods for gene expression data analysis, it is important to develop an appropriate and rigorous validation scheme to assess the performance and limitations of the most widely used clustering algorithms. In this paper, we present a ground truth based comparative study on the functionality, accuracy, and stability of five data clustering methods, namely hierarchical clustering, K-means clustering, self-organizing maps, standard finite normal mixture fitting, and a caBIG toolkit (VIsual Statistical Data Analyzer--VISDA), tested on sample clustering of seven published microarray gene expression datasets and one synthetic dataset. We examined the performance of these algorithms in both data-sufficient and data-insufficient cases using quantitative performance measures, including cluster number detection accuracy and mean and standard deviation of partition accuracy. The experimental results showed that VISDA, an interactive coarse-to-fine maximum likelihood fitting algorithm, is a solid performer on most of the datasets, while K-means clustering and self-organizing maps optimized by the mean squared compactness criterion generally produce more stable solutions than the other methods.

摘要

鉴于基因表达数据分析有多种可用的聚类方法,开发一种合适且严谨的验证方案来评估最广泛使用的聚类算法的性能和局限性很重要。在本文中,我们对五种数据聚类方法(即层次聚类、K均值聚类、自组织映射、标准有限正态混合拟合和一个caBIG工具包(可视化统计数据分析器——VISDA))的功能、准确性和稳定性进行了基于真实情况的比较研究,这些方法在七个已发表的微阵列基因表达数据集和一个合成数据集的样本聚类上进行了测试。我们使用定量性能指标,包括聚类数量检测准确性以及划分准确性的均值和标准差,在数据充足和数据不足的情况下检验了这些算法的性能。实验结果表明,VISDA(一种交互式的从粗到细的最大似然拟合算法)在大多数数据集上表现出色,而通过均方紧致性准则优化的K均值聚类和自组织映射通常比其他方法产生更稳定的解决方案。

相似文献

1
A ground truth based comparative study on clustering of gene expression data.一项基于真实数据的基因表达数据聚类比较研究。
Front Biosci. 2008 May 1;13:3839-49. doi: 10.2741/2972.
2
caBIG VISDA: modeling, visualization, and discovery for cluster analysis of genomic data.caBIG VISDA:用于基因组数据聚类分析的建模、可视化与发现
BMC Bioinformatics. 2008 Sep 18;9:383. doi: 10.1186/1471-2105-9-383.
3
VISDA: an open-source caBIG analytical tool for data clustering and beyond.VISDA:一个用于数据聚类及其他功能的开源caBIG分析工具。
Bioinformatics. 2007 Aug 1;23(15):2024-7. doi: 10.1093/bioinformatics/btm290. Epub 2007 May 31.
4
Bayesian hierarchical clustering for studying cancer gene expression data with unknown statistics.用于研究统计信息未知的癌症基因表达数据的贝叶斯层次聚类法。
PLoS One. 2013 Oct 23;8(10):e75748. doi: 10.1371/journal.pone.0075748. eCollection 2013.
5
A dynamically growing self-organizing tree (DGSOT) for hierarchical clustering gene expression profiles.一种用于分层聚类基因表达谱的动态生长自组织树(DGSOT)。
Bioinformatics. 2004 Nov 1;20(16):2605-17. doi: 10.1093/bioinformatics/bth292. Epub 2004 May 6.
6
Comparisons and validation of statistical clustering techniques for microarray gene expression data.微阵列基因表达数据统计聚类技术的比较与验证
Bioinformatics. 2003 Mar 1;19(4):459-66. doi: 10.1093/bioinformatics/btg025.
7
Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm.使用一致性算法对大型DNA微阵列数据集进行稳健的多尺度聚类
Bioinformatics. 2006 Jan 1;22(1):58-67. doi: 10.1093/bioinformatics/bti746. Epub 2005 Oct 27.
8
Microarray data clustering based on temporal variation: FCV with TSD preclustering.基于时间变化的微阵列数据聚类:采用TSD预聚类的FCV法
Appl Bioinformatics. 2003;2(1):35-45.
9
Phenotypic-specific gene module discovery using a diagnostic tree and caBIG VISDA.使用诊断树和caBIG VISDA发现表型特异性基因模块。
Conf Proc IEEE Eng Med Biol Soc. 2006;2006:5767-70. doi: 10.1109/IEMBS.2006.260031.
10
An interactive approach to multiobjective clustering of gene expression patterns.一种基因表达模式的交互式多目标聚类方法。
IEEE Trans Biomed Eng. 2013 Jan;60(1):35-41. doi: 10.1109/TBME.2012.2220765. Epub 2012 Sep 28.

引用本文的文献

1
Ground truth clustering is not the optimum clustering.真实聚类并非最优聚类。
Sci Rep. 2025 Mar 17;15(1):9223. doi: 10.1038/s41598-025-90865-9.
2
The transcriptomic revolution and radiation biology.转录组学革命与放射生物学。
Int J Radiat Biol. 2022;98(3):428-438. doi: 10.1080/09553002.2021.1987562. Epub 2021 Oct 11.
3
Dose-related gene expression changes in forebrain following acute, low-level chlorpyrifos exposure in neonatal rats.急性、低水平氯吡硫磷暴露对新生大鼠前脑相关基因表达的影响。
Toxicol Appl Pharmacol. 2010 Oct 15;248(2):144-55. doi: 10.1016/j.taap.2010.07.026. Epub 2010 Aug 5.
4
Copy number analysis indicates monoclonal origin of lethal metastatic prostate cancer.拷贝数分析表明致死性转移性前列腺癌起源于单克隆。
Nat Med. 2009 May;15(5):559-65. doi: 10.1038/nm.1944. Epub 2009 Apr 12.
5
caBIG VISDA: modeling, visualization, and discovery for cluster analysis of genomic data.caBIG VISDA:用于基因组数据聚类分析的建模、可视化与发现
BMC Bioinformatics. 2008 Sep 18;9:383. doi: 10.1186/1471-2105-9-383.

本文引用的文献

1
Probabilistic principal component subspaces: a hierarchical finite mixture model for data visualization.概率主成分子空间:用于数据可视化的分层有限混合模型。
IEEE Trans Neural Netw. 2000;11(3):625-36. doi: 10.1109/72.846734.
2
Phenotypic-specific gene module discovery using a diagnostic tree and caBIG VISDA.使用诊断树和caBIG VISDA发现表型特异性基因模块。
Conf Proc IEEE Eng Med Biol Soc. 2006;2006:5767-70. doi: 10.1109/IEMBS.2006.260031.
3
VISDA: an open-source caBIG analytical tool for data clustering and beyond.VISDA:一个用于数据聚类及其他功能的开源caBIG分析工具。
Bioinformatics. 2007 Aug 1;23(15):2024-7. doi: 10.1093/bioinformatics/btm290. Epub 2007 May 31.
4
Metric for measuring the effectiveness of clustering of DNA microarray expression.用于测量 DNA 微阵列表达聚类有效性的度量。
BMC Bioinformatics. 2006 Sep 6;7 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2105-7-S2-S5.
5
Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes.使用功能类别参考集评估基因表达数据聚类算法的方法。
BMC Bioinformatics. 2006 Aug 31;7:397. doi: 10.1186/1471-2105-7-397.
6
Evaluation and comparison of gene clustering methods in microarray analysis.微阵列分析中基因聚类方法的评估与比较
Bioinformatics. 2006 Oct 1;22(19):2405-12. doi: 10.1093/bioinformatics/btl406. Epub 2006 Jul 31.
7
Survey of clustering algorithms.聚类算法综述
IEEE Trans Neural Netw. 2005 May;16(3):645-78. doi: 10.1109/TNN.2005.845141.
8
Computational cluster validation in post-genomic data analysis.后基因组数据分析中的计算聚类验证
Bioinformatics. 2005 Aug 1;21(15):3201-12. doi: 10.1093/bioinformatics/bti517. Epub 2005 May 24.
9
Bayesian class discovery in microarray datasets.微阵列数据集中的贝叶斯类发现
IEEE Trans Biomed Eng. 2004 May;51(5):707-18. doi: 10.1109/TBME.2004.824139.
10
Multi-platform, multi-site, microarray-based human tumor classification.基于多平台、多站点微阵列的人类肿瘤分类
Am J Pathol. 2004 Jan;164(1):9-16. doi: 10.1016/S0002-9440(10)63090-8.