• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于评估聚类稳定性的新重采样方法。

New resampling method for evaluating stability of clusters.

作者信息

Gana Dresen Irina M, Boes Tanja, Huesing Johannes, Neuhaeuser Markus, Joeckel Karl-Heinz

机构信息

Institut für Medizinische Informatik, Biometrie und Epidemiologie, Universitaetsklinikum Essen, Germany.

出版信息

BMC Bioinformatics. 2008 Jan 24;9:42. doi: 10.1186/1471-2105-9-42.

DOI:10.1186/1471-2105-9-42
PMID:18218074
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2265265/
Abstract

BACKGROUND

Hierarchical clustering is a widely applied tool in the analysis of microarray gene expression data. The assessment of cluster stability is a major challenge in clustering procedures. Statistical methods are required to distinguish between real and random clusters. Several methods for assessing cluster stability have been published, including resampling methods such as the bootstrap. We propose a new resampling method based on continuous weights to assess the stability of clusters in hierarchical clustering. While in bootstrapping approximately one third of the original items is lost, continuous weights avoid zero elements and instead allow non integer diagonal elements, which leads to retention of the full dimensionality of space, i.e. each variable of the original data set is represented in the resampling sample.

RESULTS

Comparison of continuous weights and bootstrapping using real datasets and simulation studies reveals the advantage of continuous weights especially when the dataset has only few observations, few differentially expressed genes and the fold change of differentially expressed genes is low.

CONCLUSION

We recommend the use of continuous weights in small as well as in large datasets, because according to our results they produce at least the same results as conventional bootstrapping and in some cases they surpass it.

摘要

背景

层次聚类是微阵列基因表达数据分析中广泛应用的工具。聚类稳定性评估是聚类过程中的一项重大挑战。需要统计方法来区分真实聚类和随机聚类。已经发表了几种评估聚类稳定性的方法,包括重采样方法,如自助法。我们提出了一种基于连续权重的新重采样方法,以评估层次聚类中聚类的稳定性。在自助法中,大约三分之一的原始数据项会丢失,而连续权重避免了零元素,取而代之的是允许非整数对角元素,这导致保留了空间的全维度,即原始数据集中的每个变量都在重采样样本中得到体现。

结果

使用真实数据集和模拟研究对连续权重和自助法进行比较,结果表明连续权重具有优势,尤其是当数据集观测值较少、差异表达基因较少且差异表达基因的倍数变化较低时。

结论

我们建议在小数据集和大数据集中都使用连续权重,因为根据我们的结果,它们至少能产生与传统自助法相同的结果,并且在某些情况下优于传统自助法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd42/2265265/30ea8dbd935b/1471-2105-9-42-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd42/2265265/27b4360cb959/1471-2105-9-42-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd42/2265265/48b2b430bc82/1471-2105-9-42-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd42/2265265/588d7adb25bf/1471-2105-9-42-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd42/2265265/52134e67d31f/1471-2105-9-42-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd42/2265265/4d535f7b8752/1471-2105-9-42-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd42/2265265/3cf41b1067ef/1471-2105-9-42-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd42/2265265/30ea8dbd935b/1471-2105-9-42-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd42/2265265/27b4360cb959/1471-2105-9-42-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd42/2265265/48b2b430bc82/1471-2105-9-42-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd42/2265265/588d7adb25bf/1471-2105-9-42-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd42/2265265/52134e67d31f/1471-2105-9-42-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd42/2265265/4d535f7b8752/1471-2105-9-42-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd42/2265265/3cf41b1067ef/1471-2105-9-42-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd42/2265265/30ea8dbd935b/1471-2105-9-42-7.jpg

相似文献

1
New resampling method for evaluating stability of clusters.用于评估聚类稳定性的新重采样方法。
BMC Bioinformatics. 2008 Jan 24;9:42. doi: 10.1186/1471-2105-9-42.
2
Clustering of gene expression data: performance and similarity analysis.基因表达数据的聚类:性能与相似性分析
BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S19. doi: 10.1186/1471-2105-7-S4-S19.
3
Bagging to improve the accuracy of a clustering procedure.通过装袋法提高聚类过程的准确性。
Bioinformatics. 2003 Jun 12;19(9):1090-9. doi: 10.1093/bioinformatics/btg038.
4
Comparing algorithms for clustering of expression data: how to assess gene clusters.比较用于表达数据聚类的算法:如何评估基因簇。
Methods Mol Biol. 2009;541:479-509. doi: 10.1007/978-1-59745-243-4_21.
5
A Resampling Based Clustering Algorithm for Replicated Gene Expression Data.一种基于重采样的重复基因表达数据聚类算法。
IEEE/ACM Trans Comput Biol Bioinform. 2015 Nov-Dec;12(6):1295-303. doi: 10.1109/TCBB.2015.2403320.
6
Evaluation and comparison of gene clustering methods in microarray analysis.微阵列分析中基因聚类方法的评估与比较
Bioinformatics. 2006 Oct 1;22(19):2405-12. doi: 10.1093/bioinformatics/btl406. Epub 2006 Jul 31.
7
Randomized maps for assessing the reliability of patients clusters in DNA microarray data analyses.用于评估DNA微阵列数据分析中患者聚类可靠性的随机图谱。
Artif Intell Med. 2006 Jun;37(2):85-109. doi: 10.1016/j.artmed.2006.03.005. Epub 2006 May 23.
8
Supervised clustering of genes.基因的监督聚类
Genome Biol. 2002;3(12):RESEARCH0069. doi: 10.1186/gb-2002-3-12-research0069. Epub 2002 Nov 25.
9
Clustering of change patterns using Fourier coefficients.使用傅里叶系数对变化模式进行聚类。
Bioinformatics. 2008 Jan 15;24(2):184-91. doi: 10.1093/bioinformatics/btm568. Epub 2007 Nov 19.
10
Analysis of a Gibbs sampler method for model-based clustering of gene expression data.一种基于模型的基因表达数据聚类的吉布斯采样器方法分析。
Bioinformatics. 2008 Jan 15;24(2):176-83. doi: 10.1093/bioinformatics/btm562. Epub 2007 Nov 22.

引用本文的文献

1
A novel measure and significance testing in data analysis of cell image segmentation.细胞图像分割数据分析中的一种新测量方法及显著性检验
BMC Bioinformatics. 2017 Mar 14;18(1):168. doi: 10.1186/s12859-017-1527-x.
2
Merged consensus clustering to assess and improve class discovery with microarray data.合并共识聚类评估和改进微阵列数据的分类发现。
BMC Bioinformatics. 2010 Dec 3;11:590. doi: 10.1186/1471-2105-11-590.
3
Very Important Pool (VIP) genes--an application for microarray-based molecular signatures.非常重要的基因池(VIP)基因——基于微阵列的分子特征的一种应用。

本文引用的文献

1
Evaluation of clustering algorithms for gene expression data.基因表达数据聚类算法的评估
BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S17. doi: 10.1186/1471-2105-7-S4-S17.
2
Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes.使用功能类别参考集评估基因表达数据聚类算法的方法。
BMC Bioinformatics. 2006 Aug 31;7:397. doi: 10.1186/1471-2105-7-397.
3
Evaluation and comparison of gene clustering methods in microarray analysis.微阵列分析中基因聚类方法的评估与比较
BMC Bioinformatics. 2008 Aug 12;9 Suppl 9(Suppl 9):S9. doi: 10.1186/1471-2105-9-S9-S9.
Bioinformatics. 2006 Oct 1;22(19):2405-12. doi: 10.1093/bioinformatics/btl406. Epub 2006 Jul 31.
4
Computational cluster validation in post-genomic data analysis.后基因组数据分析中的计算聚类验证
Bioinformatics. 2005 Aug 1;21(15):3201-12. doi: 10.1093/bioinformatics/bti517. Epub 2005 May 24.
5
Tight clustering: a resampling-based approach for identifying stable and tight patterns in data.紧密聚类:一种基于重采样的方法,用于识别数据中的稳定且紧密的模式。
Biometrics. 2005 Mar;61(1):10-6. doi: 10.1111/j.0006-341X.2005.031032.x.
6
The application of computers to taxonomy.计算机在分类学中的应用。
J Gen Microbiol. 1957 Aug;17(1):201-26. doi: 10.1099/00221287-17-1-201.
7
Cluster stability scores for microarray data in cancer studies.癌症研究中微阵列数据的聚类稳定性评分。
BMC Bioinformatics. 2003 Sep 6;4:36. doi: 10.1186/1471-2105-4-36.
8
Bagging to improve the accuracy of a clustering procedure.通过装袋法提高聚类过程的准确性。
Bioinformatics. 2003 Jun 12;19(9):1090-9. doi: 10.1093/bioinformatics/btg038.
9
Tumor classification based on gene expression profiling shows that uveal melanomas with and without monosomy 3 represent two distinct entities.基于基因表达谱的肿瘤分类表明,伴有和不伴有3号染色体单体型的葡萄膜黑色素瘤代表两种不同的实体。
Cancer Res. 2003 May 15;63(10):2578-84.
10
Comparisons and validation of statistical clustering techniques for microarray gene expression data.微阵列基因表达数据统计聚类技术的比较与验证
Bioinformatics. 2003 Mar 1;19(4):459-66. doi: 10.1093/bioinformatics/btg025.