Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, MA, USA.
Department of Public Health, University of Massachusetts Lowell, Lowell, MA, USA.
BMC Public Health. 2022 Aug 9;22(1):1515. doi: 10.1186/s12889-022-13809-2.
Electronic Health Record (EHR) data are increasingly being used to monitor population health on account of their timeliness, granularity, and large sample sizes. While EHR data are often sufficient to estimate disease prevalence and trends for large geographic areas, the same accuracy and precision may not carry over for smaller areas that are sparsely represented by non-random samples.
We developed small-area estimation models using a combination of EHR data drawn from MDPHnet, an EHR-based public health surveillance network in Massachusetts, the American Community Survey, and state hospitalization data. We estimated municipality-specific prevalence rates of asthma, diabetes, hypertension, obesity, and smoking in each of the 351 municipalities in Massachusetts in 2016. Models were compared against Behavioral Risk Factor Surveillance System (BRFSS) state and small area estimates for 2016.
Integrating progressively more variables into prediction models generally reduced mean absolute error (MAE) relative to municipality-level BRFSS small area estimates: asthma (2.24% MAE crude, 1.02% MAE modeled), diabetes (3.13% MAE crude, 3.48% MAE modeled), hypertension (2.60% MAE crude, 1.48% MAE modeled), obesity (4.92% MAE crude, 4.07% MAE modeled), and smoking (5.33% MAE crude, 2.99% MAE modeled). Correlation between modeled estimates and BRFSS estimates for the 13 municipalities in Massachusetts covered by BRFSS's 500 Cities ranged from 81.9% (obesity) to 96.7% (diabetes).
Small-area estimation using EHR data is feasible and generates estimates comparable to BRFSS state and small-area estimates. Integrating EHR data with survey data can provide timely and accurate disease monitoring tools for areas with sparse data coverage.
电子健康记录 (EHR) 数据因其及时性、粒度和大样本量而越来越多地被用于监测人群健康。虽然 EHR 数据通常足以估计大地理区域的疾病流行率和趋势,但对于代表性不足的小区域,其准确性和精密度可能无法延伸到非随机样本。
我们使用了一种组合方法,结合了来自马萨诸塞州 MDPHnet 的 EHR 数据、美国社区调查和州住院数据,开发了小区域估计模型。我们估计了马萨诸塞州 351 个市 2016 年哮喘、糖尿病、高血压、肥胖和吸烟的具体市患病率。模型与 2016 年行为风险因素监测系统 (BRFSS) 州和小区域估计进行了比较。
将越来越多的变量逐步纳入预测模型通常会降低与市一级 BRFSS 小区域估计相比的平均绝对误差 (MAE):哮喘 (MAE 粗 2.24%,MAE 模型 1.02%)、糖尿病 (MAE 粗 3.13%,MAE 模型 3.48%)、高血压 (MAE 粗 2.60%,MAE 模型 1.48%)、肥胖 (MAE 粗 4.92%,MAE 模型 4.07%)和吸烟 (MAE 粗 5.33%,MAE 模型 2.99%)。马萨诸塞州 13 个被 BRFSS 的 500 个城市覆盖的市中,模型估计与 BRFSS 估计之间的相关性从 81.9%(肥胖)到 96.7%(糖尿病)不等。
使用 EHR 数据进行小区域估计是可行的,并生成与 BRFSS 州和小区域估计相当的估计。将 EHR 数据与调查数据相结合,可以为数据覆盖稀疏的地区提供及时、准确的疾病监测工具。