Bhavnani Suresh K, Zhang Weibin, Bao Daniel, Raji Mukaila, Ajewole Veronica, Hunter Rodney, Kuo Yong-Fang, Schmidt Susanne, Pappadis Monique R, Smith Elise, Bokov Alex, Reistetter Timothy, Visweswaran Shyam, Downer Brian
School of Public and Population Health, Department of Biostatistics & Data Science, University of Texas Medical Branch, Galveston, TX, United States.
Department of Radiology, Houston Methodist, Houston, TX, United States.
J Med Internet Res. 2025 Feb 11;27:e48775. doi: 10.2196/48775.
Social determinants of health (SDoH), such as financial resources and housing stability, account for between 30% and 55% of people's health outcomes. While many studies have identified strong associations between specific SDoH and health outcomes, little is known about how SDoH co-occur to form subtypes critical for designing targeted interventions. Such analysis has only now become possible through the All of Us program.
This study aims to analyze the All of Us dataset for addressing two research questions: (1) What are the range of and responses to survey questions related to SDoH? and (2) How do SDoH co-occur to form subtypes, and what are their risks for adverse health outcomes?
For question 1, an expert panel analyzed the range of and responses to SDoH questions across 6 surveys in the full All of Us dataset (N=372,397; version 6). For question 2, due to systematic missingness and uneven granularity of questions across the surveys, we selected all participants with valid and complete SDoH data and used inverse probability weighting to adjust their imbalance in demographics. Next, an expert panel grouped the SDoH questions into SDoH factors to enable more consistent granularity. To identify the subtypes, we used bipartite modularity maximization for identifying SDoH biclusters and measured their significance and replicability. Next, we measured their association with 3 outcomes (depression, delayed medical care, and emergency room visits in the last year). Finally, the expert panel inferred the subtype labels, potential mechanisms, and targeted interventions.
The question 1 analysis identified 110 SDoH questions across 4 surveys covering all 5 domains in Healthy People 2030. As the SDoH questions varied in granularity, they were categorized by an expert panel into 18 SDoH factors. The question 2 analysis (n=12,913; d=18) identified 4 biclusters with significant biclusteredness (Q=0.13; random-Q=0.11; z=7.5; P<.001) and significant replication (real Rand index=0.88; random Rand index=0.62; P<.001). Each subtype had significant associations with specific outcomes and had meaningful interpretations and potential targeted interventions. For example, the Socioeconomic barriers subtype included 6 SDoH factors (eg, not employed and food insecurity) and had a significantly higher odds ratio (4.2, 95% CI 3.5-5.1; P<.001) for depression when compared to other subtypes. The expert panel inferred implications of the results for designing interventions and health care policies based on SDoH subtypes.
This study identified SDoH subtypes that had statistically significant biclusteredness and replicability, each of which had significant associations with specific adverse health outcomes and with translational implications for targeted SDoH interventions and health care policies. However, the high degree of systematic missingness requires repeating the analysis as the data become more complete by using our generalizable and scalable machine learning code available on the All of Us workbench.
健康的社会决定因素(SDoH),如经济资源和住房稳定性,在人们的健康结果中占比30%至55%。虽然许多研究已经确定了特定的SDoH与健康结果之间的紧密关联,但对于SDoH如何共同出现以形成对设计有针对性干预措施至关重要的亚型,我们却知之甚少。只有通过“我们所有人”项目,这种分析现在才成为可能。
本研究旨在分析“我们所有人”数据集,以解决两个研究问题:(1)与SDoH相关的调查问题的范围和回答是什么?(2)SDoH如何共同出现以形成亚型,以及它们对不良健康结果的风险是什么?
对于问题1,一个专家小组分析了完整的“我们所有人”数据集中6项调查中与SDoH问题相关的范围和回答(N = 372,397;版本6)。对于问题2,由于调查中问题存在系统性缺失和粒度不均,我们选择了所有拥有有效且完整SDoH数据的参与者,并使用逆概率加权来调整他们在人口统计学上的不平衡。接下来,一个专家小组将SDoH问题分组为SDoH因素,以实现更一致的粒度。为了识别亚型,我们使用二分模块最大化来识别SDoH双聚类,并测量它们的显著性和可重复性。接下来,我们测量它们与3个结果(抑郁、延迟医疗护理和去年的急诊就诊)的关联。最后,专家小组推断亚型标签、潜在机制和有针对性的干预措施。
问题1的分析在4项调查中确定了110个与SDoH相关的问题,涵盖了《健康人民2030》中的所有5个领域。由于SDoH问题在粒度上有所不同,专家小组将它们分类为18个SDoH因素。问题2的分析(n = 12,913;维度 = 18)确定了4个具有显著双聚类性的双聚类(Q = 0.13;随机Q = 0.11;z = 7.5;P <.001)和显著的可重复性(真实兰德指数 = 0.88;随机兰德指数 = 0.62;P <.001)。每个亚型与特定结果都有显著关联,并且有有意义的解释和潜在的针对性干预措施。例如,社会经济障碍亚型包括6个SDoH因素(如未就业和粮食不安全),与其他亚型相比,抑郁的优势比显著更高(4.2,95%置信区间3.5 - 5.1;P <.001)。专家小组推断了这些结果对基于SDoH亚型设计干预措施和医疗保健政策的启示。
本研究确定了具有统计学显著双聚类性和可重复性的SDoH亚型,每个亚型都与特定的不良健康结果有显著关联,并对有针对性的SDoH干预措施和医疗保健政策具有转化意义。然而,由于高度的系统性缺失,需要随着数据通过使用我们在“我们所有人”工作台上提供的可推广和可扩展的机器学习代码变得更加完整,重复进行分析。