Department of Family Medicine, Oregon Health & Science University, Portland, Oregon, USA.
Biostatistics Group, School of Public Health, Oregon Health & Science University - Portland State University, Portland, Oregon, USA.
Health Serv Res. 2023 Oct;58(5):1119-1130. doi: 10.1111/1475-6773.14154. Epub 2023 Mar 28.
To develop and validate prediction models for inference of Latino nativity to advance health equity research.
DATA SOURCES/STUDY SETTING: This study used electronic health records (EHRs) from 19,985 Latino children with self-reported country of birth seeking care from January 1, 2012 to December 31, 2018 at 456 community health centers (CHCs) across 15 states along with census-tract geocoded neighborhood composition and surname data.
We constructed and evaluated the performance of prediction models within a broad machine learning framework (Super Learner) for the estimation of Latino nativity. Outcomes included binary indicators denoting nativity (US vs. foreign-born) and Latino country of birth (Mexican, Cuban, Guatemalan). The performance of these models was compared using the area under the receiver operating characteristics curve (AUC) from an externally withheld patient sample.
DATA COLLECTION/EXTRACTION METHODS: Census surname lists, census neighborhood composition, and Forebears administrative data were linked to EHR data.
Of the 19,985 Latino patients, 10.7% reported a non-US country of birth (5.1% Mexican, 4.7% Guatemalan, 0.8% Cuban). Overall, prediction models for nativity showed outstanding performance with external validation (US-born vs. foreign: AUC = 0.90; Mexican vs. non-Mexican: AUC = 0.89; Guatemalan vs. non-Guatemalan: AUC = 0.95; Cuban vs. non-Cuban: AUC = 0.99).
Among challenges facing health equity researchers in health services is the absence of methods for data disaggregation, and the specific ability to determine Latino country of birth (nativity) to inform disparities. Recent interest in more robust health equity research has called attention to the importance of data disaggregation. In a multistate network of CHCs using multilevel inputs from EHR data linked to surname and community data, we developed and validated novel prediction models for the use of available EHR data to infer Latino nativity for health disparities research in primary care and health services research, which is a significant potential methodologic advance in studying this population.
开发和验证用于推断拉丁裔出生地的预测模型,以推进健康公平研究。
数据来源/研究环境:本研究使用了来自 19985 名拉丁裔儿童的电子健康记录(EHR),这些儿童在 2012 年 1 月 1 日至 2018 年 12 月 31 日期间,在 15 个州的 456 个社区卫生中心(CHC)就诊,报告了自己的出生国,同时还包括了普查区地理编码的邻里组成和姓氏数据。
我们在一个广泛的机器学习框架(超级学习者)内构建和评估了用于估计拉丁裔出生地的预测模型的性能。结果包括表示出生地(美国与外国出生)和拉丁裔出生国(墨西哥、古巴、危地马拉)的二元指标。使用从外部保留的患者样本获得的接收器操作特性曲线(AUC)下的面积来比较这些模型的性能。
数据收集/提取方法:人口普查姓氏列表、人口普查邻里组成和祖先行政数据与 EHR 数据相关联。
在 19985 名拉丁裔患者中,有 10.7%报告了非美国出生国(5.1%为墨西哥,4.7%为危地马拉,0.8%为古巴)。总体而言,出生地预测模型的外部验证表现出色(美国出生与外国出生:AUC=0.90;墨西哥与非墨西哥:AUC=0.89;危地马拉与非危地马拉:AUC=0.95;古巴与非古巴:AUC=0.99)。
在卫生服务健康公平研究人员面临的挑战中,缺乏数据分解的方法,以及确定拉丁裔出生国(出生地)以告知差异的具体能力。最近对更稳健的健康公平研究的兴趣引起了对数据分解的重要性的关注。在一个使用 EHR 数据与姓氏和社区数据的多层次输入的多州 CHC 网络中,我们开发和验证了新的预测模型,用于利用现有 EHR 数据推断拉丁裔出生地,以便在初级保健和卫生服务研究中进行健康差异研究,这是研究这一人群的一个重要的潜在方法学进展。