Szymonifka Jackie, Conderino Sarah, Cigolle Christine, Ha Jinkyung, Kabeto Mohammed, Yu Jaehong, Dodson John A, Thorpe Lorna, Blaum Caroline, Zhong Judy
Division of Biostatistics, Department of Population Health, NYU Langone Health, New York, New York, USA.
Division of Epidemiology, Department of Population Health, NYU Langone Health, New York, New York, USA.
JAMIA Open. 2020 Dec 5;3(4):583-592. doi: 10.1093/jamiaopen/ooaa059. eCollection 2020 Dec.
Electronic health records (EHRs) have become a common data source for clinical risk prediction, offering large sample sizes and frequently sampled metrics. There may be notable differences between hospital-based EHR and traditional cohort samples: EHR data often are not population-representative random samples, even for particular diseases, as they tend to be sicker with higher healthcare utilization, while cohort studies often sample healthier subjects who typically are more likely to participate. We investigate heterogeneities between EHR- and cohort-based inferences including incidence rates, risk factor identifications/quantifications, and absolute risks.
This is a retrospective cohort study of older patients with type 2 diabetes using EHR from New York University Langone Health ambulatory care (NYULH-EHR, years 2009-2017) and from the Health and Retirement Survey (HRS, 1995-2014) to study subsequent cardiovascular disease (CVD) risks. We used the same eligibility criteria, outcome definitions, and demographic covariates/biomarkers in both datasets. We compared subsequent CVD incidence rates, hazard ratios (HRs) of risk factors, and discrimination/calibration performances of CVD risk scores.
The estimated subsequent total CVD incidence rate was 37.5 and 90.6 per 1000 person-years since T2DM onset in HRS and NYULH-EHR respectively. HR estimates were comparable between the datasets for most demographic covariates/biomarkers. Common CVD risk scores underestimated observed total CVD risks in NYULH-EHR.
EHR-estimated HRs of demographic and major clinical risk factors for CVD were mostly consistent with the estimates from a national cohort, despite high incidences and absolute risks of total CVD outcome in the EHR samples.
电子健康记录(EHR)已成为临床风险预测的常见数据源,提供了大样本量和频繁采样的指标。基于医院的EHR与传统队列样本之间可能存在显著差异:EHR数据通常不是具有人群代表性的随机样本,即使对于特定疾病也是如此,因为它们往往病情更重,医疗利用率更高,而队列研究通常对更健康的受试者进行采样,这些受试者通常更有可能参与。我们研究了基于EHR和基于队列的推断之间的异质性,包括发病率、风险因素识别/量化和绝对风险。
这是一项对老年2型糖尿病患者的回顾性队列研究,使用纽约大学朗格尼健康门诊护理的EHR(NYULH-EHR,2009 - 2017年)和健康与退休调查(HRS,1995 - 2014年)来研究后续心血管疾病(CVD)风险。我们在两个数据集中使用相同的纳入标准、结局定义和人口统计学协变量/生物标志物。我们比较了后续CVD发病率、风险因素的风险比(HR)以及CVD风险评分的区分度/校准性能。
自T2DM发病以来,HRS和NYULH-EHR中每1000人年估计的后续总CVD发病率分别为37.5和90.6。对于大多数人口统计学协变量/生物标志物,数据集之间的HR估计值具有可比性。常见的CVD风险评分低估了NYULH-EHR中观察到的总CVD风险。
尽管EHR样本中CVD总体结局的发病率和绝对风险较高,但EHR估计的CVD人口统计学和主要临床风险因素的HR大多与全国队列的估计一致。