Frndak Seth, Queirolo Elena I, Mañay Nelly, Yu Guan, Ahmed Zia, Barg Gabriel, Colder Craig, Kordas Katarzyna
Department of Epidemiology and Environmental Health, University at Buffalo, The State University of New York USA, Buffalo, New York, United States of America.
Department of Neuroscience and Learning, Catholic University of Uruguay, Montevideo, Uruguay.
PLOS Glob Public Health. 2024 Sep 4;4(9):e0003607. doi: 10.1371/journal.pgph.0003607. eCollection 2024.
Predicting childhood blood lead levels (BLLs) has had mixed success, and it is unclear if individual- or neighborhood-level variables are most predictive. An ensemble machine learning (ML) approach to identify the most relevant predictors of BLL ≥2μg/dL in urban children was implemented. A cross-sectional sample of 603 children (~7 years of age) recruited between 2009-2019 from Montevideo, Uruguay participated in the study. 77 individual- and 32 neighborhood-level variables were used to predict BLLs ≥2μg/dL. Three ensemble learners were created: one with individual-level predictors (Ensemble-I), one with neighborhood-level predictors (Ensemble-N), and one with both (Ensemble-All). Each ensemble learner comprised four base classifiers with 50% training, 25% validation, and 25% test datasets. Predictive performance of the three ensemble models was compared using area under the curve (AUC) for the receiver operating characteristic (ROC), precision, sensitivity, and specificity on the test dataset. Ensemble-I (AUC: 0.75, precision: 0.56, sensitivity: 0.79, specificity: 0.65) performed similarly to Ensemble-All (AUC: 0.75, precision: 0.63, sensitivity: 0.79, specificity: 0.69). Ensemble-N (AUC: 0.51, precision: 0.0, sensitivity: 0.0, specificity: 0.50) severely underperformed. Year of enrollment was most important in Ensemble-I and Ensemble-All, followed by household water Pb. Three neighborhood-level variables were among the top 10 important predictors in Ensemble-All (density of bus routes, dwellings with stream/other water source and distance to nearest river). The individual-level only model performed best, although precision was improved when both neighborhood and individual-level variables were included. Future predictive models of lead exposure should consider proximal predictors (i.e., household characteristics).
预测儿童血铅水平(BLLs)的效果喜忧参半,目前尚不清楚个体层面或社区层面的变量哪一个最具预测性。本研究采用集成机器学习(ML)方法来确定城市儿童血铅水平≥2μg/dL的最相关预测因素。2009年至2019年期间,从乌拉圭蒙得维的亚招募了603名儿童(约7岁)作为横断面样本参与研究。使用77个个体层面和32个社区层面的变量来预测血铅水平≥2μg/dL。创建了三个集成学习器:一个使用个体层面预测因素(集成学习器-I),一个使用社区层面预测因素(集成学习器-N),一个同时使用两者(集成学习器-All)。每个集成学习器都包含四个基本分类器,使用50%的训练数据集、25%的验证数据集和25%的测试数据集。使用测试数据集上的受试者工作特征曲线(ROC)下面积(AUC)、精度、灵敏度和特异性来比较这三个集成模型的预测性能。集成学习器-I(AUC:0.75,精度:0.56,灵敏度:0.79,特异性:0.65)的表现与集成学习器-All(AUC:0.75,精度:0.63,灵敏度:0.79,特异性:0.69)相似。集成学习器-N(AUC:0.51,精度:0.0,灵敏度:0.0,特异性:0.50)的表现严重不佳。入学年份在集成学习器-I和集成学习器-All中最为重要,其次是家庭用水中的铅含量。在集成学习器-All中,三个社区层面的变量位列前10个重要预测因素之中(公交线路密度、有溪流/其他水源的住宅以及到最近河流的距离)。仅包含个体层面变量的模型表现最佳,不过当同时纳入社区层面和个体层面变量时,精度有所提高。未来铅暴露的预测模型应考虑近端预测因素(即家庭特征)。