• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

聚类验证指标的加权排序聚合:一种蒙特卡洛交叉熵方法。

Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach.

作者信息

Pihur Vasyl, Datta Susmita, Datta Somnath

机构信息

Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY 40202, USA.

出版信息

Bioinformatics. 2007 Jul 1;23(13):1607-15. doi: 10.1093/bioinformatics/btm158. Epub 2007 May 5.

DOI:10.1093/bioinformatics/btm158
PMID:17483500
Abstract

MOTIVATION

Biologists often employ clustering techniques in the explorative phase of microarray data analysis to discover relevant biological groupings. Given the availability of numerous clustering algorithms in the machine-learning literature, an user might want to select one that performs the best for his/her data set or application. While various validation measures have been proposed over the years to judge the quality of clusters produced by a given clustering algorithm including their biological relevance, unfortunately, a given clustering algorithm can perform poorly under one validation measure while outperforming many other algorithms under another validation measure. A manual synthesis of results from multiple validation measures is nearly impossible in practice, especially, when a large number of clustering algorithms are to be compared using several measures. An automated and objective way of reconciling the rankings is needed.

RESULTS

Using a Monte Carlo cross-entropy algorithm, we successfully combine the ranks of a set of clustering algorithms under consideration via a weighted aggregation that optimizes a distance criterion. The proposed weighted rank aggregation allows for a far more objective and automated assessment of clustering results than a simple visual inspection. We illustrate our procedure using one simulated as well as three real gene expression data sets from various platforms where we rank a total of eleven clustering algorithms using a combined examination of 10 different validation measures. The aggregate rankings were found for a given number of clusters k and also for an entire range of k.

AVAILABILITY

R code for all validation measures and rank aggregation is available from the authors upon request.

SUPPLEMENTARY INFORMATION

Supplementary information are available at http://www.somnathdatta.org/Supp/RankCluster/supp.htm.

摘要

动机

生物学家在微阵列数据分析的探索阶段经常采用聚类技术来发现相关的生物分组。鉴于机器学习文献中存在众多聚类算法,用户可能希望选择一种最适合其数据集或应用的算法。多年来,人们提出了各种验证措施来判断给定聚类算法产生的聚类质量,包括其生物学相关性。不幸的是,给定的聚类算法在一种验证措施下可能表现不佳,而在另一种验证措施下却优于许多其他算法。在实践中,几乎不可能手动综合多种验证措施的结果,特别是当要使用多种措施比较大量聚类算法时。因此需要一种自动且客观的方法来协调排名。

结果

我们使用蒙特卡罗交叉熵算法,通过优化距离准则的加权聚合成功地组合了一组正在考虑的聚类算法的排名。与简单的目视检查相比,所提出的加权排名聚合允许对聚类结果进行更加客观和自动化的评估。我们使用一个模拟数据集以及来自不同平台的三个真实基因表达数据集来说明我们的过程,在这些数据集中,我们通过对10种不同验证措施的综合考察对总共11种聚类算法进行排名。针对给定数量的聚类k以及整个k范围都找到了综合排名。

可用性

作者可应要求提供所有验证措施和排名聚合的R代码。

补充信息

补充信息可在http://www.somnathdatta.org/Supp/RankCluster/supp.htm获取。

相似文献

1
Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach.聚类验证指标的加权排序聚合:一种蒙特卡洛交叉熵方法。
Bioinformatics. 2007 Jul 1;23(13):1607-15. doi: 10.1093/bioinformatics/btm158. Epub 2007 May 5.
2
Detecting clusters of different geometrical shapes in microarray gene expression data.在微阵列基因表达数据中检测不同几何形状的聚类。
Bioinformatics. 2005 May 1;21(9):1927-34. doi: 10.1093/bioinformatics/bti251. Epub 2005 Jan 12.
3
Graph-based consensus clustering for class discovery from gene expression data.基于图的共识聚类用于从基因表达数据中发现类别
Bioinformatics. 2007 Nov 1;23(21):2888-96. doi: 10.1093/bioinformatics/btm463. Epub 2007 Sep 14.
4
A new algorithm for comparing and visualizing relationships between hierarchical and flat gene expression data clusterings.一种用于比较和可视化层次化与平面化基因表达数据聚类之间关系的新算法。
Bioinformatics. 2005 Nov 1;21(21):3993-9. doi: 10.1093/bioinformatics/bti644. Epub 2005 Sep 1.
5
A mixture model with random-effects components for clustering correlated gene-expression profiles.一种具有随机效应成分的混合模型,用于对相关基因表达谱进行聚类。
Bioinformatics. 2006 Jul 15;22(14):1745-52. doi: 10.1093/bioinformatics/btl165. Epub 2006 May 3.
6
A multi-stage approach to clustering and imputation of gene expression profiles.一种用于基因表达谱聚类和插补的多阶段方法。
Bioinformatics. 2007 Apr 15;23(8):998-1005. doi: 10.1093/bioinformatics/btm053. Epub 2007 Feb 18.
7
Divisive Correlation Clustering Algorithm (DCCA) for grouping of genes: detecting varying patterns in expression profiles.用于基因分组的分裂相关聚类算法(DCCA):检测表达谱中的变化模式。
Bioinformatics. 2008 Jun 1;24(11):1359-66. doi: 10.1093/bioinformatics/btn133. Epub 2008 Apr 10.
8
Clustering of change patterns using Fourier coefficients.使用傅里叶系数对变化模式进行聚类。
Bioinformatics. 2008 Jan 15;24(2):184-91. doi: 10.1093/bioinformatics/btm568. Epub 2007 Nov 19.
9
Hierarchical tree snipping: clustering guided by prior knowledge.层次树剪枝:由先验知识引导的聚类
Bioinformatics. 2007 Dec 15;23(24):3335-42. doi: 10.1093/bioinformatics/btm526. Epub 2007 Nov 7.
10
Analysis of a Gibbs sampler method for model-based clustering of gene expression data.一种基于模型的基因表达数据聚类的吉布斯采样器方法分析。
Bioinformatics. 2008 Jan 15;24(2):176-83. doi: 10.1093/bioinformatics/btm562. Epub 2007 Nov 22.

引用本文的文献

1
A comparison of normalization methods for the expression of genes associated with oxidative stress in the liver of sheep.绵羊肝脏中与氧化应激相关基因表达的标准化方法比较
BMC Genom Data. 2025 Jul 31;26(1):53. doi: 10.1186/s12863-025-01345-y.
2
Selection and Validation of Reference Genes in Under Abiotic Stresses, MeJA Treatment, and in Different Tissues.非生物胁迫、茉莉酸甲酯处理及不同组织中内参基因的筛选与验证
Int J Mol Sci. 2025 Mar 11;26(6):2483. doi: 10.3390/ijms26062483.
3
Evolution of the Swiss pork production systems and logistics: the impact on infectious disease resilience.
瑞士猪肉生产系统与物流的演变:对传染病恢复力的影响。
Sci Rep. 2025 Mar 6;15(1):7842. doi: 10.1038/s41598-025-92011-x.
4
Exposome-Wide Association Study of Body Mass Index Using a Novel Meta-Analytical Approach for Random Forest Models.基于随机森林模型的新型元分析方法的体质量指数暴露组关联研究。
Environ Health Perspect. 2024 Jun;132(6):67007. doi: 10.1289/EHP13393. Epub 2024 Jun 18.
5
Towards more precise automatic analysis: a systematic review of deep learning-based multi-organ segmentation.迈向更精确的自动分析:基于深度学习的多器官分割的系统评价。
Biomed Eng Online. 2024 Jun 8;23(1):52. doi: 10.1186/s12938-024-01238-8.
6
and Gene Expression in Kidneys and Their Involvement in Calcium and Phosphate Metabolism in Laying Hens.以及蛋鸡肾脏中的基因表达及其在钙磷代谢中的作用
Animals (Basel). 2024 May 8;14(10):1407. doi: 10.3390/ani14101407.
7
Identification and validation of stable reference genes for RT-qPCR analyses of Kobresia littledalei seedlings.鉴定和验证 Kobresia littledalei 幼苗 RT-qPCR 分析的稳定参考基因。
BMC Plant Biol. 2024 May 11;24(1):389. doi: 10.1186/s12870-024-04924-w.
8
Selection of the optimal personalized treatment from multiple treatments with right-censored multivariate outcome measures.从具有右删失多变量结果指标的多种治疗方法中选择最优个性化治疗方案。
J Appl Stat. 2023 Jan 10;51(5):891-912. doi: 10.1080/02664763.2022.2164759. eCollection 2024.
9
Detection of spondylosis deformans in thoracolumbar and lumbar lateral X-ray images of dogs using a deep learning network.使用深度学习网络检测犬胸腰椎和腰椎侧位X线图像中的变形性脊椎病。
Front Vet Sci. 2024 Feb 15;11:1334438. doi: 10.3389/fvets.2024.1334438. eCollection 2024.
10
Optimal Personalized Treatment Selection with Multivariate Outcome Measures in a Multiple Treatment Case.在多治疗案例中基于多变量结果指标进行最佳个性化治疗选择
Commun Stat Simul Comput. 2023;52(12):5773-5787. doi: 10.1080/03610918.2021.1999473. Epub 2021 Nov 15.