Marker David A, Hilton Charity, Zelko Jacob, Duke Jon, Rolka Deborah, Kaufmann Rachel, Boyd Richard
Marker Consulting, Columbia, Maryland, United States of America.
Georgia Tech Research Institute, Atlanta, Georgia, United States of America.
J Surv Stat Methodol. 2024 Nov;12(5):1515-1530. doi: 10.1093/jssam/smae036.
Government statistical offices worldwide are under pressure to produce statistics rapidly and for more detailed geographies, to compete with unofficial estimates available from web-based big data sources or from private companies. Commonly suggested sources of improved health information are electronic health records (EHRs) and medical claims data. These data sources are collectively known as real world data (RWD) because they are generated from routine health care processes, and they are available for millions of patients. It is clear that RWD can provide estimates that are more timely and less expensive to produce- but a key question is whether or not they are very accurate. To test this, we took advantage of a unique health data source that includes a full range of sociodemographic variables and compare estimates using all of those potential weighting variables, versus estimates derived when only age and sex are available for weighting (as is common with most RWD sources). We show that not accounting for other variables can produce misleading, and quite inaccurate, health estimates.
全球各国政府统计机构都面临着压力,需要更快地生成统计数据,且统计范围要更详细,以便与基于网络的大数据源或私人公司提供的非官方估计数据竞争。通常建议的改善健康信息的来源是电子健康记录(EHR)和医疗理赔数据。这些数据源统称为真实世界数据(RWD),因为它们是从常规医疗保健过程中生成的,并且有数百万患者的数据可供使用。很明显,真实世界数据可以提供更及时且成本更低的估计数据——但一个关键问题是这些数据是否非常准确。为了验证这一点,我们利用了一个独特的健康数据源,该数据源包含一系列社会人口统计学变量,并比较了使用所有这些潜在加权变量得出的估计值,与仅使用年龄和性别进行加权时得出的估计值(大多数真实世界数据源通常如此)。我们发现,不考虑其他变量会产生误导性且相当不准确的健康估计值。