• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

无监督学习方法在高效地理聚类和疾病差异识别中的应用——以加利福尼亚州县级结直肠癌发病率为例。

Unsupervised learning methods for efficient geographic clustering and identification of disease disparities with applications to county-level colorectal cancer incidence in California.

机构信息

Department of Mathematics, Box 8205, North Carolina State University, Raleigh, NC, 27695-8205, USA.

Department of Economics and Finance, La Sapienza University of Rome, 00185, Roma, Italy.

出版信息

Health Care Manag Sci. 2022 Dec;25(4):574-589. doi: 10.1007/s10729-022-09604-5. Epub 2022 Jun 23.

DOI:10.1007/s10729-022-09604-5
PMID:35732967
Abstract

Many public health policymaking questions involve data subsets representing application-specific attributes and geographic location. We develop and evaluate standard and tailored techniques for clustering via unsupervised learning (UL) algorithms on such amalgamated (dual-domain) data sets. The aim of the associated algorithms is to identify geographically efficient clusters that also maximize the number of statistically significant differences in disease incidence and demographic variables across top clusters. Two standard UL approaches, k means with k++ initialization (k++) and the standard self-organizing map (SSOM), are considered along with a new, tailored version of the SOM (TSOM). The TSOM algorithm involves optimization of a customized objective function with terms promoting individual geographic cluster cohesion while also maximizing the number of differences across clusters, and two hyper-parameters controlling the relative weighting of geographic and attribute subspaces in a non-Euclidean distance measure within the clustering problem. The performance of these three techniques (k++, SSOM, TSOM) is compared and evaluated in the context of a data set for colorectal cancer incidence in the state of California, at the level of individual counties. Clusters are visualized via chloropleth maps and ordered graphs are also used to illustrate disparities in disease incidence among four identity groups. While all three approaches performed well, the TSOM identified the largest number of disease and demographic disparities while also yielding more geographically efficient top clusters. Techniques presented in this study are relevant to applications including the delivery of health care resources and identifying disparities among identity groups, and to questions involving coordination between county- and state-level policymakers.

摘要

许多公共卫生政策制定问题都涉及代表特定属性和地理位置的数据子集。我们开发并评估了基于无监督学习(UL)算法的标准和定制聚类技术,用于此类合并(双域)数据集。相关算法的目的是识别具有地理效率的聚类,同时最大限度地增加顶级聚类中疾病发病率和人口统计学变量的统计显着差异的数量。考虑了两种标准的 UL 方法,k 均值与 k++初始化(k++)和标准自组织映射(SSOM),以及自组织映射(TSOM)的新版本。TSOM 算法涉及优化自定义目标函数,该函数的项促进单个地理聚类内聚,同时还最大限度地增加了跨聚类的差异数量,并且两个超参数控制聚类问题中非欧几里得距离度量中属性子空间和属性子空间的相对权重。在加利福尼亚州结肠癌发病率的数据集的上下文中比较和评估了这三种技术(k++,SSOM,TSOM)的性能。通过 chloropleth 地图可视化聚类,并且还使用有序图来说明四个身份群体之间疾病发病率的差异。尽管所有三种方法都表现良好,但 TSOM 确定了最大数量的疾病和人口统计学差异,同时也产生了更多具有地理效率的顶级聚类。本研究中提出的技术与包括医疗资源的提供和识别身份群体之间的差异以及涉及县和州政策制定者之间协调的问题等应用相关。

相似文献

1
Unsupervised learning methods for efficient geographic clustering and identification of disease disparities with applications to county-level colorectal cancer incidence in California.无监督学习方法在高效地理聚类和疾病差异识别中的应用——以加利福尼亚州县级结直肠癌发病率为例。
Health Care Manag Sci. 2022 Dec;25(4):574-589. doi: 10.1007/s10729-022-09604-5. Epub 2022 Jun 23.
2
Identifying and evaluating clinical subtypes of Alzheimer's disease in care electronic health records using unsupervised machine learning.利用无监督机器学习在护理电子健康记录中识别和评估阿尔茨海默病的临床亚型。
BMC Med Inform Decis Mak. 2021 Dec 8;21(1):343. doi: 10.1186/s12911-021-01693-6.
3
Sheep's coping style can be identified by unsupervised machine learning from unlabeled data.通过对无标签数据进行无监督机器学习,可以识别出绵羊的应对方式。
Behav Processes. 2022 Jan;194:104559. doi: 10.1016/j.beproc.2021.104559. Epub 2021 Nov 25.
4
Comparison of machine learning clustering algorithms for detecting heterogeneity of treatment effect in acute respiratory distress syndrome: A secondary analysis of three randomised controlled trials.机器学习聚类算法在急性呼吸窘迫综合征治疗效果异质性检测中的比较:三项随机对照试验的二次分析。
EBioMedicine. 2021 Dec;74:103697. doi: 10.1016/j.ebiom.2021.103697. Epub 2021 Dec 1.
5
An unsupervised neuromorphic clustering algorithm.一种无监督神经形态聚类算法。
Biol Cybern. 2019 Aug;113(4):423-437. doi: 10.1007/s00422-019-00797-7. Epub 2019 Apr 3.
6
An analysis framework for clustering algorithm selection with applications to spectroscopy.一种聚类算法选择的分析框架及其在光谱学中的应用。
PLoS One. 2022 Mar 31;17(3):e0266369. doi: 10.1371/journal.pone.0266369. eCollection 2022.
7
Machine-learned cluster identification in high-dimensional data.高维数据中的机器学习聚类识别
J Biomed Inform. 2017 Feb;66:95-104. doi: 10.1016/j.jbi.2016.12.011. Epub 2016 Dec 28.
8
A density-based competitive data stream clustering network with self-adaptive distance metric.一种基于密度的具有自适应距离度量的竞争数据流聚类网络。
Neural Netw. 2019 Feb;110:141-158. doi: 10.1016/j.neunet.2018.11.008. Epub 2018 Nov 27.
9
Voxel-based clustered imaging by multiparameter diffusion tensor images for glioma grading.基于体素的多参数弥散张量成像在胶质瘤分级中的聚类成像。
Neuroimage Clin. 2014 Aug 7;5:396-407. doi: 10.1016/j.nicl.2014.08.001. eCollection 2014.
10
Application of Multiple Unsupervised Models to Validate Clusters Robustness in Characterizing Smallholder Dairy Farmers.应用多种无监督模型验证聚类在刻画小农户奶农特征方面的稳健性
ScientificWorldJournal. 2019 Jan 2;2019:1020521. doi: 10.1155/2019/1020521. eCollection 2019.

引用本文的文献

1
AI analysis of medical images at scale as a health disparities probe: a feasibility demonstration using chest radiographs.大规模医学图像的人工智能分析作为健康差异探测器:使用胸部X光片的可行性示范
ArXiv. 2025 Apr 8:arXiv:2504.05990v1.