Chen Jie, de Hoogh Kees, Gulliver John, Hoffmann Barbara, Hertel Ole, Ketzel Matthias, Weinmayr Gudrun, Bauwelinck Mariska, van Donkelaar Aaron, Hvidtfeldt Ulla A, Atkinson Richard, Janssen Nicole A H, Martin Randall V, Samoli Evangelia, Andersen Zorana J, Oftedal Bente M, Stafoggia Massimo, Bellander Tom, Strak Maciej, Wolf Kathrin, Vienneau Danielle, Brunekreef Bert, Hoek Gerard
Institute for Risk Assessment Sciences (IRAS), Utrecht University, Postbus 80125, 3508 TC Utrecht, The Netherlands.
Swiss Tropical and Public Health Institute, Socinstrasse 57, 4051 Basel, Switzerland.
Environ Sci Technol. 2020 Dec 15;54(24):15698-15709. doi: 10.1021/acs.est.0c06595. Epub 2020 Nov 25.
We developed Europe-wide models of long-term exposure to eight elements (copper, iron, potassium, nickel, sulfur, silicon, vanadium, and zinc) in particulate matter with diameter <2.5 μm (PM) using standardized measurements for one-year periods between October 2008 and April 2011 in 19 study areas across Europe, with supervised linear regression (SLR) and random forest (RF) algorithms. Potential predictor variables were obtained from satellites, chemical transport models, land-use, traffic, and industrial point source databases to represent different sources. Overall model performance across Europe was moderate to good for all elements with hold-out-validation -squared ranging from 0.41 to 0.90. RF consistently outperformed SLR. Models explained within-area variation much less than the overall variation, with similar performance for RF and SLR. Maps proved a useful additional model evaluation tool. Models differed substantially between elements regarding major predictor variables, broadly reflecting known sources. Agreement between the two algorithm predictions was generally high at the overall European level and varied substantially at the national level. Applying the two models in epidemiological studies could lead to different associations with health. If both between- and within-area exposure variability are exploited, RF may be preferred. If only within-area variability is used, both methods should be interpreted equally.
我们利用2008年10月至2011年4月期间欧洲19个研究区域为期一年的标准化测量数据,采用监督线性回归(SLR)和随机森林(RF)算法,建立了欧洲范围内直径小于2.5微米的颗粒物(PM)中八种元素(铜、铁、钾、镍、硫、硅、钒和锌)长期暴露的模型。潜在预测变量来自卫星、化学传输模型、土地利用、交通和工业点源数据库,以代表不同来源。通过留出验证,欧洲所有元素的总体模型性能为中等至良好,决定系数范围为0.41至0.90。随机森林算法始终优于监督线性回归算法。模型对区域内变异的解释远低于总体变异,随机森林算法和监督线性回归算法的性能相似。地图被证明是一种有用的额外模型评估工具。不同元素的模型在主要预测变量方面存在很大差异,大致反映了已知来源。两种算法预测在欧洲总体水平上的一致性通常较高,而在国家层面上差异很大。在流行病学研究中应用这两种模型可能会导致与健康的不同关联。如果同时利用区域间和区域内暴露变异性,随机森林算法可能更受青睐。如果仅使用区域内变异性,则两种方法应同等解释。