• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

针对不平衡聚类的强大显著性检验。

Powerful significance testing for unbalanced clusters.

作者信息

Keefe Thomas H, Marron J S

机构信息

Department of Statistics & O.R., UNC-Chapel Hill.

出版信息

J Comput Graph Stat. 2025 Apr 16. doi: 10.1080/10618600.2025.2469756.

DOI:10.1080/10618600.2025.2469756
PMID:40857487
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12338451/
Abstract

Clustering methods are popular for revealing structure in data, particularly in the high-dimensional setting common to contemporary data science. A central question is "are the clusters really there?" One pioneering method in statistical cluster validation is , but it is severely underpowered in the important setting where the candidate clusters have unbalanced sizes, such as in rare subtypes of disease. We show why this is the case and propose a remedy that is powerful in both the unbalanced and balanced settings, using a novel generalization of -means clustering. We illustrate the value of our method using a high-dimensional dataset of gene expression in kidney cancer patients. A Python implementation is available at https://github.com/thomaskeefe/sigclust.

摘要

聚类方法在揭示数据结构方面很受欢迎,尤其是在当代数据科学常见的高维环境中。一个核心问题是“聚类真的存在吗?”统计聚类验证中的一种开创性方法是 ,但在候选聚类大小不均衡的重要情况下,比如在疾病的罕见亚型中,它的功效严重不足。我们说明了为何会出现这种情况,并提出了一种在不均衡和均衡情况下都有效的补救方法,该方法使用了 -均值聚类的一种新颖推广。我们使用肾癌患者基因表达的高维数据集说明了我们方法的价值。可在https://github.com/thomaskeefe/sigclust获取Python实现代码。

相似文献

1
Powerful significance testing for unbalanced clusters.针对不平衡聚类的强大显著性检验。
J Comput Graph Stat. 2025 Apr 16. doi: 10.1080/10618600.2025.2469756.
2
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
3
Antidepressants for pain management in adults with chronic pain: a network meta-analysis.抗抑郁药治疗成人慢性疼痛的疼痛管理:一项网络荟萃分析。
Health Technol Assess. 2024 Oct;28(62):1-155. doi: 10.3310/MKRT2948.
4
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
5
Psychological interventions for adults who have sexually offended or are at risk of offending.针对有性犯罪行为或有性犯罪风险的成年人的心理干预措施。
Cochrane Database Syst Rev. 2012 Dec 12;12(12):CD007507. doi: 10.1002/14651858.CD007507.pub2.
6
Deworming drugs for soil-transmitted intestinal worms in children: effects on nutritional indicators, haemoglobin and school performance.儿童肠道土源性蠕虫驱虫药物:对营养指标、血红蛋白及学业表现的影响
Cochrane Database Syst Rev. 2012 Jul 11(7):CD000371. doi: 10.1002/14651858.CD000371.pub4.
7
Plug-and-play use of tree-based methods: consequences for clinical prediction modeling.基于树的方法的即插即用:对临床预测模型的影响。
J Clin Epidemiol. 2025 Aug;184:111834. doi: 10.1016/j.jclinepi.2025.111834. Epub 2025 May 19.
8
Macrolide antibiotics (including azithromycin) for cystic fibrosis.大环内酯类抗生素(包括阿奇霉素)治疗囊性纤维化。
Cochrane Database Syst Rev. 2024 Feb 27;2(2):CD002203. doi: 10.1002/14651858.CD002203.pub5.
9
Interventions targeted at women to encourage the uptake of cervical screening.针对女性的干预措施,以鼓励她们接受宫颈癌筛查。
Cochrane Database Syst Rev. 2021 Sep 6;9(9):CD002834. doi: 10.1002/14651858.CD002834.pub3.
10
Unsupervised clustering for sepsis identification in large-scale patient data: a model development and validation study.用于大规模患者数据中脓毒症识别的无监督聚类:一项模型开发与验证研究。
Intensive Care Med Exp. 2025 Mar 20;13(1):37. doi: 10.1186/s40635-025-00744-w.

本文引用的文献

1
Statistical Significance of Clustering with Multidimensional Scaling.多维缩放聚类的统计显著性
J Comput Graph Stat. 2024;33(1):219-230. doi: 10.1080/10618600.2023.2219708. Epub 2023 Jul 20.
2
Selective Inference for Hierarchical Clustering.层次聚类的选择性推断
J Am Stat Assoc. 2024;119(545):332-342. doi: 10.1080/01621459.2022.2116331. Epub 2022 Oct 11.
3
Selective inference for -means clustering.均值聚类的选择性推断。
J Mach Learn Res. 2023 May;24.
4
Biclustering reveals potential knee OA phenotypes in exploratory analyses: Data from the Osteoarthritis Initiative.基于探索性分析的双聚类揭示了膝骨关节炎的潜在表型:来自骨关节炎倡议的数据。
PLoS One. 2022 May 24;17(5):e0266964. doi: 10.1371/journal.pone.0266964. eCollection 2022.
5
Multi-Omics Data Fusion for Cancer Molecular Subtyping Using Sparse Canonical Correlation Analysis.基于稀疏典型相关分析的癌症分子亚型多组学数据融合
Front Genet. 2021 Jul 22;12:607817. doi: 10.3389/fgene.2021.607817. eCollection 2021.
6
Bioinformatics analyses of retinoblastoma reveal the retinoblastoma progression subtypes.视网膜母细胞瘤的生物信息学分析揭示了视网膜母细胞瘤进展亚型。
PeerJ. 2020 May 21;8:e8873. doi: 10.7717/peerj.8873. eCollection 2020.
7
Dissecting cancer heterogeneity based on dimension reduction of transcriptomic profiles using extreme learning machines.基于转录组谱的降维使用极限学习机剖析癌症异质性。
PLoS One. 2018 Sep 14;13(9):e0203824. doi: 10.1371/journal.pone.0203824. eCollection 2018.
8
Multi-stage Differentiation Defines Melanoma Subtypes with Differential Vulnerability to Drug-Induced Iron-Dependent Oxidative Stress.多阶段分化定义了具有不同药物诱导铁依赖性氧化应激易感性的黑色素瘤亚型。
Cancer Cell. 2018 May 14;33(5):890-904.e5. doi: 10.1016/j.ccell.2018.03.017. Epub 2018 Apr 12.
9
Statistical significance for hierarchical clustering.层次聚类的统计学显著性。
Biometrics. 2017 Sep;73(3):811-821. doi: 10.1111/biom.12647. Epub 2017 Jan 18.
10
Statistical Significance of Clustering using Soft Thresholding.使用软阈值法进行聚类的统计学意义。
J Comput Graph Stat. 2015;24(4):975-993. doi: 10.1080/10618600.2014.948179. Epub 2015 Dec 10.