Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia.
J Public Health Manag Pract. 2024;30(5):733-743. doi: 10.1097/PHH.0000000000002014. Epub 2024 Jul 22.
Injection drug use (IDU) is a major contributor to the syndemic of viral hepatitis, human immunodeficiency virus, and drug overdose. However, information on IDU is frequently missing in national viral hepatitis surveillance data, which limits our understanding of the full extent of IDU-associated infections. Multiple imputation by chained equations (MICE) has become a popular approach to address missing data, but its application for IDU imputation is less studied.
Using the 2019-2021 National Notifiable Diseases Surveillance System acute hepatitis C case data and publicly available county-level measures, we evaluated listwise deletion (LD) and 3 models imputing missing IDU data through MICE: parametric logistic regression, semi-parametric predictive mean matching (PMM), and nonparametric random forest (RF) (both standard RF [sRF] and fast implementation of RF [fRF]).
The estimated IDU prevalence among acute hepatitis C cases increased from 63.5% by LD to 65.1% by logistic regression, 66.9% by PMM, 76.0% by sRF, and 85.1% by fRF. Evaluation studies showed that RF-based MICE imputation, especially fRF, has the highest accuracy (as measured by smallest raw bias, percent bias, and root mean square error) and highest efficiency (as measured by smallest 95% confidence interval width) compared to LD and other models. Sensitivity analyses indicated that fRF remained robust when data were missing not at random.
Our analysis suggested that RF-based MICE imputation, especially fRF, could be a valuable approach for addressing missing IDU data in the context of population-based surveillance systems like National Notifiable Diseases Surveillance System. The inclusion of imputed IDU data may enhance the effectiveness of future surveillance and prevention efforts for the IDU-driven syndemic.
注射吸毒(IDU)是病毒性肝炎、人类免疫缺陷病毒和药物过量流行的主要原因。然而,国家病毒性肝炎监测数据中经常缺少关于 IDU 的信息,这限制了我们对 IDU 相关感染的全面了解。链式方程多重插补(MICE)已成为处理缺失数据的常用方法,但对 IDU 插补的应用研究较少。
利用 2019-2021 年全国法定传染病监测系统急性丙型肝炎病例数据和公开的县级指标,我们评估了完全删除(LD)和通过 MICE 对缺失 IDU 数据进行 3 种模型插补:参数逻辑回归、半参数预测均值匹配(PMM)和非参数随机森林(RF)(标准 RF[sRF]和快速实现的 RF[fRF])。
LD 法估计急性丙型肝炎病例中 IDU 的流行率为 63.5%,逻辑回归法为 65.1%,PMM 法为 66.9%,sRF 法为 76.0%,fRF 法为 85.1%。评估研究表明,与 LD 和其他模型相比,基于 RF 的 MICE 插补,尤其是 fRF,具有最高的准确性(以原始偏差、百分比偏差和均方根误差最小衡量)和最高的效率(以 95%置信区间宽度最小衡量)。敏感性分析表明,当数据缺失不是随机时,fRF 仍然稳健。
我们的分析表明,基于 RF 的 MICE 插补,尤其是 fRF,可能是处理国家法定传染病监测系统等基于人群的监测系统中缺失 IDU 数据的一种有价值的方法。纳入插补的 IDU 数据可能会增强未来 IDU 驱动的流行症监测和预防工作的效果。