Columbia University School of Nursing, New York, NY, United States.
Columbia University School of Nursing, New York, NY, United States.
Ann Epidemiol. 2024 Jun;94:120-126. doi: 10.1016/j.annepidem.2024.05.003. Epub 2024 May 10.
To evaluate the effectiveness of Bayesian Improved Surname Geocoding (BISG) and Bayesian Improved First Name Surname Geocoding (BIFSG) in estimating race and ethnicity, and how they influence odds ratios for preterm birth.
We analyzed hospital birth admission electronic health records (EHR) data (N = 9985). We created two simulation sets with 40 % of race and ethnicity data missing randomly or more likely for non-Hispanic black birthing people who had preterm birth. We calculated C-statistics to evaluate how accurately BISG and BIFSG estimate race and ethnicity. We examined the association between race and ethnicity and preterm birth using logistic regression and reported odds ratios (OR).
BISG and BIFSG showed high accuracy for most racial and ethnic categories (C-statistics = 0.94-0.97, 95 % confidence intervals [CI] = 0.92-0.97). When race and ethnicity were not missing at random, BISG (OR = 1.25, CI = 0.97-1.62) and BIFSG (OR = 1.38, CI = 1.08-1.76) resulted in positive estimates mirroring the true association (OR = 1.68, CI = 1.34-2.09) for Non-Hispanic Black birthing people, while traditional methods showed contrasting estimates (Complete case OR = 0.62, CI = 0.41-0.94; multiple imputation OR = 0.63, CI = 0.40-0.98).
BISG and BIFSG accurately estimate missing race and ethnicity in perinatal EHR data, decreasing bias in preterm birth research, and are recommended over traditional methods to reduce potential bias.
评估贝叶斯改进姓氏地理编码(BISG)和贝叶斯改进名姓地理编码(BIFSG)在估计种族和族裔方面的有效性,以及它们如何影响早产的优势比。
我们分析了医院分娩入院电子健康记录(EHR)数据(N=9985)。我们创建了两个模拟集,其中 40%的种族和族裔数据随机缺失,或更可能缺失非西班牙裔黑人分娩者的早产数据。我们计算了 C 统计量来评估 BISG 和 BIFSG 估计种族和族裔的准确性。我们使用逻辑回归检查种族和族裔与早产之间的关联,并报告优势比(OR)。
BISG 和 BIFSG 对大多数种族和族裔类别显示出很高的准确性(C 统计量=0.94-0.97,95%置信区间[CI] = 0.92-0.97)。当种族和族裔不是随机缺失时,BISG(OR=1.25,CI=0.97-1.62)和 BIFSG(OR=1.38,CI=1.08-1.76)产生了与真实关联(OR=1.68,CI=1.34-2.09)相匹配的阳性估计值,而非西班牙裔黑人分娩者,而传统方法则显示出相反的估计值(完整案例 OR=0.62,CI=0.41-0.94;多重插补 OR=0.63,CI=0.40-0.98)。
BISG 和 BIFSG 可准确估计围产期 EHR 数据中缺失的种族和族裔,减少早产研究中的偏差,并建议使用传统方法来减少潜在偏差。