Chen Xingyu, Kitchen Christopher, Kharrazi Hadi
Biomedical Informatics and Data Science, Johns Hopkins School of Medicine, Baltimore, MD 21205, United States.
Center for Population Health IT, Johns Hopkins School of Public Health, Baltimore, MD 21205, United States.
JAMIA Open. 2025 Aug 18;8(4):ooaf093. doi: 10.1093/jamiaopen/ooaf093. eCollection 2025 Aug.
To evaluate and compare different dimensionality reduction techniques for quantifying housing conditions as a social determinant of health (SDOH) across various geographic levels in the United States.
A total of 15 housing characteristics from the American Community Survey data were analyzed at county, ZIP code, and Census tract levels. The robustness of 3 dimensionality reduction techniques was assessed in reducing the 15 housing characteristics into 1 housing score. These summarization methods included principal component analysis (PCA), t-distributed stochastic neighbor embedding (tSNE), and uniform manifold approximation and projection (UMAP). We visualized geographic distributions of the housing scores, assessed methodological discrepancies between the techniques, and analyzed agreement between housing characteristic variability and housing score variability.
The selected dimensionality reduction methods generated housing scores that demonstrated acceptable face validity when visualized through choropleth maps. The PCA method provided the most stable and consistent results across geographic levels. The PCA method also resulted in the highest correlation between the variability of the underlying housing characteristics and the summarized housing score.
Data-driven summarization techniques provide an alternative approach to traditional expert-based indices in capturing housing conditions as a single SDOH factor. In this study, among the different summarized housing scores, the PCA-generated score offered superior robustness, persistent data structure, and higher stability across years.
Principal component analysis was identified as the most reliable and interpretable approach for summarizing housing conditions across geographic levels. These findings contribute to the methodological foundation required to develop robust SDOH measures that can inform public health policies and address health disparities.
评估和比较不同的降维技术,以量化住房条件作为美国不同地理层面健康的社会决定因素(SDOH)。
利用美国社区调查数据中的15项住房特征,在县、邮政编码区和人口普查区层面进行分析。评估了3种降维技术在将15项住房特征简化为1个住房得分方面的稳健性。这些汇总方法包括主成分分析(PCA)、t分布随机邻域嵌入(tSNE)和均匀流形逼近与投影(UMAP)。我们可视化了住房得分的地理分布,评估了这些技术之间的方法差异,并分析了住房特征变异性与住房得分变异性之间的一致性。
所选的降维方法生成的住房得分,通过分级统计图可视化时显示出可接受的表面效度。主成分分析方法在不同地理层面提供了最稳定和一致的结果。主成分分析方法还导致基础住房特征的变异性与汇总住房得分之间的相关性最高。
数据驱动的汇总技术为传统的基于专家的指数提供了一种替代方法,以将住房条件作为单一的健康社会决定因素进行捕捉。在本研究中,在不同的汇总住房得分中,主成分分析生成的得分具有更高的稳健性、持久的数据结构以及多年来更高的稳定性。
主成分分析被确定为跨地理层面汇总住房条件的最可靠和可解释的方法。这些发现为制定稳健的健康社会决定因素测量方法奠定了方法学基础,可为公共卫生政策提供信息并解决健康差异问题。