• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用 sharp 自动校准基于共识权重距离的聚类方法。

Automated calibration of consensus weighted distance-based clustering approaches using sharp.

机构信息

Department of Epidemiology and Biostatistics, Imperial College London, Norfolk place, London W2 1PG, United Kingdom.

Department of Mathematics, Imperial College London, London SW7 2RH, United Kingdom.

出版信息

Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad635.

DOI:10.1093/bioinformatics/btad635
PMID:37847776
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10627366/
Abstract

MOTIVATION

In consensus clustering, a clustering algorithm is used in combination with a subsampling procedure to detect stable clusters. Previous studies on both simulated and real data suggest that consensus clustering outperforms native algorithms.

RESULTS

We extend here consensus clustering to allow for attribute weighting in the calculation of pairwise distances using existing regularized approaches. We propose a procedure for the calibration of the number of clusters (and regularization parameter) by maximizing the sharp score, a novel stability score calculated directly from consensus clustering outputs, making it extremely computationally competitive. Our simulation study shows better clustering performances of (i) approaches calibrated by maximizing the sharp score compared to existing calibration scores and (ii) weighted compared to unweighted approaches in the presence of features that do not contribute to cluster definition. Application on real gene expression data measured in lung tissue reveals clear clusters corresponding to different lung cancer subtypes.

AVAILABILITY AND IMPLEMENTATION

The R package sharp (version ≥1.4.3) is available on CRAN at https://CRAN.R-project.org/package=sharp.

摘要

动机

在共识聚类中,聚类算法与抽样程序结合使用以检测稳定的聚类。先前基于模拟和真实数据的研究表明,共识聚类优于原生算法。

结果

我们在这里扩展共识聚类,允许在计算成对距离时使用现有正则化方法对属性进行加权。我们提出了一种通过最大化尖锐分数来校准聚类数量(和正则化参数)的程序,尖锐分数是直接从共识聚类输出计算得出的一种新的稳定性分数,使其在计算上极具竞争力。我们的模拟研究表明,与现有的校准分数相比,(i)通过最大化尖锐分数进行校准的方法具有更好的聚类性能,以及(ii)在存在对聚类定义没有贡献的特征的情况下,加权方法比非加权方法具有更好的聚类性能。在测量肺组织中基因表达的真实数据上的应用揭示了与不同肺癌亚型相对应的清晰聚类。

可用性和实现

R 包 sharp(版本≥1.4.3)可在 https://CRAN.R-project.org/package=sharp 上从 CRAN 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8860/10627366/2e19160e0378/btad635f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8860/10627366/371b6c81dd20/btad635f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8860/10627366/c1bf0a9b885a/btad635f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8860/10627366/0c31c2ba8812/btad635f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8860/10627366/302f50db4ea6/btad635f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8860/10627366/bd2f5410477c/btad635f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8860/10627366/2e19160e0378/btad635f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8860/10627366/371b6c81dd20/btad635f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8860/10627366/c1bf0a9b885a/btad635f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8860/10627366/0c31c2ba8812/btad635f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8860/10627366/302f50db4ea6/btad635f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8860/10627366/bd2f5410477c/btad635f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8860/10627366/2e19160e0378/btad635f5.jpg

相似文献

1
Automated calibration of consensus weighted distance-based clustering approaches using sharp.使用 sharp 自动校准基于共识权重距离的聚类方法。
Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad635.
2
Poisson hurdle model-based method for clustering microbiome features.基于泊松 hurdle 模型的微生物组特征聚类方法。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac782.
3
Scellpam: an R package/C++ library to perform parallel partitioning around medoids on scRNAseq data sets.Scellpam:一个用于在 scRNAseq 数据集上围绕质心进行并行分区的 R 包/C++ 库。
BMC Bioinformatics. 2023 Sep 14;24(1):342. doi: 10.1186/s12859-023-05471-1.
4
Consensus clustering with missing labels (ccml): a consensus clustering tool for multi-omics integrative prediction in cohorts with unequal sample coverage.共识聚类与缺失标签 (ccml):一种用于在样本覆盖不均衡的队列中进行多组学综合预测的共识聚类工具。
Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad501.
5
GeoWaVe: geometric median clustering with weighted voting for ensemble clustering of cytometry data.GeoWaVe:带加权投票的几何中位数聚类,用于流式细胞术数据的集成聚类。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac751.
6
Cross-Clustering: A Partial Clustering Algorithm with Automatic Estimation of the Number of Clusters.交叉聚类:一种具有自动估计聚类数量功能的部分聚类算法。
PLoS One. 2016 Mar 25;11(3):e0152333. doi: 10.1371/journal.pone.0152333. eCollection 2016.
7
Cumulative voting consensus method for partitions with variable number of clusters.具有可变聚类数的分区的累积投票共识方法。
IEEE Trans Pattern Anal Mach Intell. 2008 Jan;30(1):160-73. doi: 10.1109/TPAMI.2007.1138.
8
wTO: an R package for computing weighted topological overlap and a consensus network with integrated visualization tool.wTO:一个用于计算加权拓扑重叠和共识网络的 R 包,具有集成的可视化工具。
BMC Bioinformatics. 2018 Oct 24;19(1):392. doi: 10.1186/s12859-018-2351-7.
9
Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach.聚类验证指标的加权排序聚合:一种蒙特卡洛交叉熵方法。
Bioinformatics. 2007 Jul 1;23(13):1607-15. doi: 10.1093/bioinformatics/btm158. Epub 2007 May 5.
10
Merged consensus clustering to assess and improve class discovery with microarray data.合并共识聚类评估和改进微阵列数据的分类发现。
BMC Bioinformatics. 2010 Dec 3;11:590. doi: 10.1186/1471-2105-11-590.

引用本文的文献

1
Multiomic Signatures of Traffic-Related Air Pollution in London Reveal Potential Short-Term Perturbations in Gut Microbiome-Related Pathways.伦敦交通相关空气污染的多组学特征揭示了肠道微生物组相关通路的潜在短期干扰。
Environ Sci Technol. 2024 May 21;58(20):8771-8782. doi: 10.1021/acs.est.3c09148. Epub 2024 May 10.

本文引用的文献

1
Automated calibration for stability selection in penalised regression and graphical models.惩罚回归和图形模型中稳定性选择的自动校准
J R Stat Soc Ser C Appl Stat. 2023 Jul 13;72(5):1375-1393. doi: 10.1093/jrsssc/qlad058. eCollection 2023 Nov.
2
Fast and interpretable consensus clustering via minipatch learning.通过微块学习实现快速且可解释的共识聚类。
PLoS Comput Biol. 2022 Oct 3;18(10):e1010577. doi: 10.1371/journal.pcbi.1010577. eCollection 2022 Oct.
3
M3C: Monte Carlo reference-based consensus clustering.M3C:基于蒙特卡罗模拟的共识聚类。
Sci Rep. 2020 Feb 4;10(1):1816. doi: 10.1038/s41598-020-58766-1.
4
Clustering and variable selection evaluation of 13 unsupervised methods for multi-omics data integration.用于多组学数据整合的13种无监督方法的聚类和变量选择评估
Brief Bioinform. 2020 Dec 1;21(6):2011-2030. doi: 10.1093/bib/bbz138.
5
PINSPlus: a tool for tumor subtype discovery in integrated genomic data.PINSPlus:一种整合基因组数据中肿瘤亚型发现的工具。
Bioinformatics. 2019 Aug 15;35(16):2843-2846. doi: 10.1093/bioinformatics/bty1049.
6
Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris.单细胞转录组学分析 20 种小鼠器官构建小鼠多器官单细胞图谱。
Nature. 2018 Oct;562(7727):367-372. doi: 10.1038/s41586-018-0590-4. Epub 2018 Oct 3.
7
A novel approach for data integration and disease subtyping.一种用于数据集成和疾病分型的新方法。
Genome Res. 2017 Dec;27(12):2025-2039. doi: 10.1101/gr.215129.116. Epub 2017 Oct 24.
8
SC3: consensus clustering of single-cell RNA-seq data.SC3:单细胞RNA测序数据的一致性聚类
Nat Methods. 2017 May;14(5):483-486. doi: 10.1038/nmeth.4236. Epub 2017 Mar 27.
9
Critical limitations of consensus clustering in class discovery.共识聚类在类别发现中的关键局限性。
Sci Rep. 2014 Aug 27;4:6207. doi: 10.1038/srep06207.
10
The Cancer Genome Atlas Pan-Cancer analysis project.癌症基因组图谱泛癌分析项目。
Nat Genet. 2013 Oct;45(10):1113-20. doi: 10.1038/ng.2764.