Department of Population Health Sciences, University of Wisconsin Population Health Institute, University of Wisconsin-Madison, Madison, Wisconsin (Drs Pollock, Gennuso and Givens); and Department of Population Health Sciences, University of Wisconsin-Madison, Madison, Wisconsin (Dr Gangnon).
J Public Health Manag Pract. 2024;30(6):E319-E328. doi: 10.1097/PHH.0000000000002034. Epub 2024 Sep 20.
Population health rankings can be a catalyst for the improvement of health by drawing attention to areas in need of relative improvement and summarizing complex information in a manner understood by almost everyone. However, ranks also have unintended consequences, such as being interpreted as "hard truths," where variations may not be significant. There is a need to improve communication about uncertainty in ranks, with accurate interpretation. The most common solutions discussed in the literature have included modeling approaches to minimize statistical noise or borrow strength from covariates. However, the use of complex models can limit communication and implementation, especially for broad audiences.
Explore data-informed grouping (cluster analysis) as an easier-to-understand, empirical technique to account for rank imprecision that can be effectively communicated both numerically and visually.
Cluster analysis, specifically k-means clustering with Wasserstein (earth mover's) distance, was explored as an approach to identify natural and meaningful groupings and gaps in the data distribution for the County Health Rankings' (CHR) health outcomes ranks.
County-level health outcomes from the 2022 CHR.
3082 counties that were ranked in the 2022 CHR.
Data-informed health groups.
Cluster analysis identified 30 health groupings among counties nationwide, with cluster size ranging from 9 to 184 counties. On average, states had 16 identified clusters, ranging from 3 in Delaware and Hawaii to 27 in Virginia. Number of clusters per state was associated with number of counties per state and population of the state. The method helped address many of the issues that arise from providing rank estimates alone.
Public health practitioners can use this information to understand uncertainty in ranks, visualize distances between county ranks, have context around which counties are not meaningfully different from one another, and compare county performance to peer counties.
人口健康排名可以通过引起人们对需要相对改善的领域的关注,并以几乎每个人都能理解的方式总结复杂信息,从而成为改善健康的催化剂。然而,排名也有一些意想不到的后果,例如被解释为“硬事实”,其中差异可能并不显著。需要提高对排名不确定性的沟通准确性。文献中讨论的最常见解决方案包括建模方法来最小化统计噪声,或从协变量中借用强度。然而,复杂模型的使用可能会限制沟通和实施,尤其是对于广泛的受众。
探索数据驱动的分组(聚类分析)作为一种更容易理解的、基于经验的技术,用于解释排名的不准确性,这种方法可以在数值和视觉上进行有效沟通。
聚类分析,特别是使用 Wasserstein(Earth Mover's)距离的 K-均值聚类,被探索为一种识别县健康排名(CHR)健康结果排名数据分布中的自然和有意义分组和差距的方法。
2022 年 CHR 的县级健康结果。
在 2022 年 CHR 中排名的 3082 个县。
数据驱动的健康群体。
聚类分析在全国范围内确定了 30 个健康分组,群组大小从 9 到 184 个县不等。平均而言,各州有 16 个已识别的群组,从特拉华州和夏威夷州的 3 个到弗吉尼亚州的 27 个。各州的聚类数量与各州的县数量和州人口有关。该方法有助于解决由于单独提供排名估计而产生的许多问题。
公共卫生从业人员可以使用此信息了解排名的不确定性,可视化县排名之间的距离,了解彼此之间没有明显差异的县的情况,并将县的绩效与同行县进行比较。