Suppr超能文献

无监督学习方法在高效地理聚类和疾病差异识别中的应用——以加利福尼亚州县级结直肠癌发病率为例。

Unsupervised learning methods for efficient geographic clustering and identification of disease disparities with applications to county-level colorectal cancer incidence in California.

机构信息

Department of Mathematics, Box 8205, North Carolina State University, Raleigh, NC, 27695-8205, USA.

Department of Economics and Finance, La Sapienza University of Rome, 00185, Roma, Italy.

出版信息

Health Care Manag Sci. 2022 Dec;25(4):574-589. doi: 10.1007/s10729-022-09604-5. Epub 2022 Jun 23.

Abstract

Many public health policymaking questions involve data subsets representing application-specific attributes and geographic location. We develop and evaluate standard and tailored techniques for clustering via unsupervised learning (UL) algorithms on such amalgamated (dual-domain) data sets. The aim of the associated algorithms is to identify geographically efficient clusters that also maximize the number of statistically significant differences in disease incidence and demographic variables across top clusters. Two standard UL approaches, k means with k++ initialization (k++) and the standard self-organizing map (SSOM), are considered along with a new, tailored version of the SOM (TSOM). The TSOM algorithm involves optimization of a customized objective function with terms promoting individual geographic cluster cohesion while also maximizing the number of differences across clusters, and two hyper-parameters controlling the relative weighting of geographic and attribute subspaces in a non-Euclidean distance measure within the clustering problem. The performance of these three techniques (k++, SSOM, TSOM) is compared and evaluated in the context of a data set for colorectal cancer incidence in the state of California, at the level of individual counties. Clusters are visualized via chloropleth maps and ordered graphs are also used to illustrate disparities in disease incidence among four identity groups. While all three approaches performed well, the TSOM identified the largest number of disease and demographic disparities while also yielding more geographically efficient top clusters. Techniques presented in this study are relevant to applications including the delivery of health care resources and identifying disparities among identity groups, and to questions involving coordination between county- and state-level policymakers.

摘要

许多公共卫生政策制定问题都涉及代表特定属性和地理位置的数据子集。我们开发并评估了基于无监督学习(UL)算法的标准和定制聚类技术,用于此类合并(双域)数据集。相关算法的目的是识别具有地理效率的聚类,同时最大限度地增加顶级聚类中疾病发病率和人口统计学变量的统计显着差异的数量。考虑了两种标准的 UL 方法,k 均值与 k++初始化(k++)和标准自组织映射(SSOM),以及自组织映射(TSOM)的新版本。TSOM 算法涉及优化自定义目标函数,该函数的项促进单个地理聚类内聚,同时还最大限度地增加了跨聚类的差异数量,并且两个超参数控制聚类问题中非欧几里得距离度量中属性子空间和属性子空间的相对权重。在加利福尼亚州结肠癌发病率的数据集的上下文中比较和评估了这三种技术(k++,SSOM,TSOM)的性能。通过 chloropleth 地图可视化聚类,并且还使用有序图来说明四个身份群体之间疾病发病率的差异。尽管所有三种方法都表现良好,但 TSOM 确定了最大数量的疾病和人口统计学差异,同时也产生了更多具有地理效率的顶级聚类。本研究中提出的技术与包括医疗资源的提供和识别身份群体之间的差异以及涉及县和州政策制定者之间协调的问题等应用相关。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验