Center for Research on Computation and Society, Harvard University John A Paulson School of Engineering and Applied Sciences, Cambridge, Massachusetts, USA.
Computer Science Department, Harvard College, Cambridge, Massachusetts, USA.
BMJ Glob Health. 2023 Oct;8(10). doi: 10.1136/bmjgh-2023-012836.
Many children in low-income and middle-income countries fail to receive any routine vaccinations. There is little evidence on how to effectively and efficiently identify and target such 'zero-dose' (ZD) children.
We examined how well predictive algorithms can characterise a child's risk of being ZD based on predictor variables that are available in routine administrative data. We applied supervised learning algorithms with three increasingly rich sets of predictors and multiple years of data from India, Mali and Nigeria. We assessed performance based on specificity, sensitivity and the F1 Score and investigated feature importance. We also examined how performance decays when the model is trained on older data. For data from India in 2015, we further compared the inclusion and exclusion errors of the algorithmic approach with a simple geographical targeting approach based on district full-immunisation coverage.
Cost-sensitive Ridge classification correctly classifies most ZD children as being at high risk in most country-years (high specificity). Performance did not meaningfully increase when predictors were added beyond an initial sparse set of seven variables. Region and measures of contact with the health system (antenatal care and birth in a facility) had the highest feature importance. Model performance decreased in the time between the data on which the model was trained and the data to which it was applied (test data). The exclusion error of the algorithmic approach was about 9.1% lower than the exclusion error of the geographical approach. Furthermore, the algorithmic approach was able to detect ZD children across 176 more areas as compared with the geographical rule, for the same number of children targeted.
Predictive algorithms applied to existing data can effectively identify ZD children and could be deployed at low cost to target interventions to reduce ZD prevalence and inequities in vaccination coverage.
许多低收入和中等收入国家的儿童未能接种任何常规疫苗。关于如何有效地识别和锁定这些“零剂量”(ZD)儿童,几乎没有证据。
我们研究了预测算法如何根据常规行政数据中可用的预测变量来准确描述儿童成为 ZD 的风险。我们应用了监督学习算法,使用了三套逐渐丰富的预测器和来自印度、马里和尼日利亚的多年数据。我们根据特异性、敏感性和 F1 分数评估了性能,并研究了特征重要性。我们还研究了当模型使用旧数据进行训练时性能如何下降。对于 2015 年印度的数据,我们进一步比较了算法方法的纳入和排除错误与基于地区完全免疫覆盖率的简单地理定位方法。
基于成本敏感的 Ridge 分类算法在大多数国家/年份中正确地将大多数 ZD 儿童归类为高风险(高特异性)。当预测器添加到初始稀疏的七个变量集之外时,性能没有显著提高。地区和与卫生系统的接触措施(产前护理和在机构中分娩)具有最高的特征重要性。模型在用于训练模型的数据和应用模型的数据之间的时间间隔内性能下降(测试数据)。算法方法的排除错误比地理方法的排除错误低约 9.1%。此外,与地理规则相比,算法方法能够在相同数量的目标儿童中检测到 176 个更多的 ZD 儿童。
应用于现有数据的预测算法可以有效地识别 ZD 儿童,并可以以低成本部署,以针对干预措施,以减少 ZD 的流行率和疫苗接种覆盖率的不平等。