Suppr超能文献

通过装袋法提高聚类过程的准确性。

Bagging to improve the accuracy of a clustering procedure.

作者信息

Dudoit Sandrine, Fridlyand Jane

机构信息

Division of Biostatistics, School of Public Health, University of California, Berkeley, 140 Earl Warren Hall, 7360, Berkeley, CA 94720-7360, USA.

出版信息

Bioinformatics. 2003 Jun 12;19(9):1090-9. doi: 10.1093/bioinformatics/btg038.

Abstract

MOTIVATION

The microarray technology is increasingly being applied in biological and medical research to address a wide range of problems such as the classification of tumors. An important statistical question associated with tumor classification is the identification of new tumor classes using gene expression profiles. Essential aspects of this clustering problem include identifying accurate partitions of the tumor samples into clusters and assessing the confidence of cluster assignments for individual samples.

RESULTS

Two new resampling methods, inspired from bagging in prediction, are proposed to improve and assess the accuracy of a given clustering procedure. In these ensemble methods, a partitioning clustering procedure is applied to bootstrap learning sets and the resulting multiple partitions are combined by voting or the creation of a new dissimilarity matrix. As in prediction, the motivation behind bagging is to reduce variability in the partitioning results via averaging. The performances of the new and existing methods were compared using simulated data and gene expression data from two recently published cancer microarray studies. The bagged clustering procedures were in general at least as accurate and often substantially more accurate than a single application of the partitioning clustering procedure. A valuable by-product of bagged clustering are the cluster votes which can be used to assess the confidence of cluster assignments for individual observations.

SUPPLEMENTARY INFORMATION

For supplementary information on datasets, analyses, and software, consult http://www.stat.berkeley.edu/~sandrine and http://www.bioconductor.org.

摘要

动机

微阵列技术在生物和医学研究中越来越多地被应用于解决广泛的问题,如肿瘤分类。与肿瘤分类相关的一个重要统计问题是利用基因表达谱识别新的肿瘤类别。这个聚类问题的关键方面包括将肿瘤样本准确地划分为不同的簇,以及评估单个样本的簇分配置信度。

结果

受预测中装袋法的启发,提出了两种新的重采样方法,以改进和评估给定聚类程序的准确性。在这些集成方法中,将一种划分聚类程序应用于自举学习集,并通过投票或创建新的差异矩阵来组合得到的多个划分。与预测一样,装袋法背后的动机是通过平均来降低划分结果的变异性。使用模拟数据和来自最近发表的两项癌症微阵列研究的基因表达数据,比较了新方法和现有方法的性能。一般来说,装袋聚类程序至少与单次应用划分聚类程序一样准确,而且往往要准确得多。装袋聚类的一个有价值的副产品是簇投票,它可用于评估单个观测值的簇分配置信度。

补充信息

有关数据集、分析和软件的补充信息,请查阅http://www.stat.berkeley.edu/~sandrine和http://www.bioconductor.org。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验