Dahlen Alex, Deng Yaowei, Charu Vivek
Department of Biostatistics, School of Global Public Health, New York University, New York, NY.
Quantitative Sciences Unit, Department of Medicine, Stanford University School of Medicine, Stanford, CA.
Am J Epidemiol. 2025 Jun 30. doi: 10.1093/aje/kwaf142.
Commercial healthcare claims datasets area non-random sample of the US population, affecting generalizability. Rigorous comparisons of claims-derived results to ground-truth data that quantify external validity bias are lacking. Our goal is to (1) quantify external validity of commercial healthcare claims data, and (2) evaluate how socioeconomic/demographic factors are related to the bias. We analyzed inpatient discharge records occurring between 01/01/2019 to 12/31/2019 in five states: California, Iowa, Maryland, Massachusetts, and New Jersey, and compared rates (per person-year) of the 250 most common inpatient procedures between claims and reference data for each target population. We used Merative™ MarketScan® Commercial Database for the claims data and State Inpatient Databases (SID) and the US Census as reference. For a target population of all Americans, commercial healthcare claims underestimate the rate of overall inpatient discharges by 23.1%. The extent of bias varied across procedures, with the rates of ~25% of procedures being underestimated by a factor of 2. Socioeconomic factors were significantly associated with the magnitude of bias (${R}^2=69.4%,$p < 0.001). When the target population was restricted to commercially insured Americans, the bias decreased substantially (1.4% of procedures were biased by more than factor of 2), but some variation across procedures remained.
商业医疗保健理赔数据集是美国人口的非随机样本,影响了结果的普遍性。目前缺乏将理赔得出的结果与能量化外部效度偏差的真实数据进行的严格比较。我们的目标是:(1)量化商业医疗保健理赔数据的外部效度,以及(2)评估社会经济/人口因素与偏差之间的关系。我们分析了2019年1月1日至2019年12月31日期间加利福尼亚州、爱荷华州、马里兰州、马萨诸塞州和新泽西州这五个州的住院出院记录,并比较了每个目标人群在理赔数据和参考数据中250种最常见住院手术的发生率(每人年)。我们使用了Merative™ MarketScan®商业数据库作为理赔数据,并使用州住院数据库(SID)和美国人口普查数据作为参考。对于所有美国人这一目标人群,商业医疗保健理赔低估了总体住院出院率23.1%。偏差程度因手术而异,约25%的手术发生率被低估了2倍。社会经济因素与偏差程度显著相关(${R}^2 = 69.4%$,p < 0.001)。当目标人群仅限于商业保险的美国人时,偏差大幅下降(1.4%的手术偏差超过2倍),但不同手术之间仍存在一些差异。