Department of Environmental Health and Engineering, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.
QR Analytics, Washington DC, USA.
Ann Work Expo Health. 2022 Jun 6;66(5):580-590. doi: 10.1093/annweh/wxab105.
Occupational exposure assessments are dominated by small sample sizes and low spatial and temporal resolution with a focus on conducting Occupational Safety and Health Administration regulatory compliance sampling. However, this style of exposure assessment is likely to underestimate true exposures and their variability in sampled areas, and entirely fail to characterize exposures in unsampled areas. The American Industrial Hygiene Association (AIHA) has developed a more realistic system of exposure ratings based on estimating the 95th percentiles of the exposures that can be used to better represent exposure uncertainty and exposure variability for decision-making; however, the ratings can still fail to capture realistic exposure with small sample sizes. Therefore, low-cost sensor networks consisting of numerous lower-quality sensors have been used to measure occupational exposures at a high spatiotemporal scale. However, the sensors must be calibrated in the laboratory or field to a reference standard. Using data from carbon monoxide (CO) sensors deployed in a heavy equipment manufacturing facility for eight months from August 2017 to March 2018, we demonstrate that machine learning with probabilistic gradient boosted decision trees (GBDT) can model raw sensor readings to reference data highly accurately, entirely removing the need for laboratory calibration. Further, we indicate how the machine learning models can produce probabilistic hazard maps of the manufacturing floor, creating a visual tool for assessing facility-wide exposures. Additionally, the ability to have a fully modeled prediction distribution for each measurement enables the use of the AIHA exposure ratings, which provide an enhanced industrial decision-making framework as opposed to simply determining if a small number of measurements were above or below a pertinent occupational exposure limit. Lastly, we show how a probabilistic modeling exposure assessment with high spatiotemporal resolution data can prevent exposure misclassifications associated with traditional models that rely exclusively on mean or point predictions.
职业暴露评估主要以小样本量和低时空分辨率为特点,侧重于进行职业安全与健康管理局监管合规性抽样。然而,这种暴露评估方式可能会低估采样区域内的真实暴露程度及其变异性,并且完全无法描述未采样区域内的暴露情况。美国工业卫生协会(AIHA)已经开发了一种基于估计暴露的 95 百分位数的更现实的暴露评级系统,该系统可用于更好地表示决策中的暴露不确定性和暴露变异性;然而,这种评级系统仍然可能无法在小样本量的情况下捕捉到真实的暴露情况。因此,已经使用由大量低质量传感器组成的低成本传感器网络来以高时空尺度测量职业暴露。然而,传感器必须在实验室或现场中针对参考标准进行校准。使用 2017 年 8 月至 2018 年 3 月在一家重型设备制造设施中部署的一氧化碳(CO)传感器的八个月数据,我们证明了使用概率梯度提升决策树(GBDT)的机器学习可以非常准确地对原始传感器读数进行建模,从而完全消除了对实验室校准的需求。此外,我们指出了机器学习模型如何生成制造车间的概率危险图,从而创建了一种评估整个设施暴露情况的可视化工具。此外,为每个测量值提供完整建模预测分布的能力使我们能够使用 AIHA 暴露评级,这为工业决策提供了一个增强的框架,而不仅仅是确定少数测量值是否高于或低于相关的职业暴露限值。最后,我们展示了如何使用具有高时空分辨率数据的概率建模暴露评估来防止与仅依赖平均值或点预测的传统模型相关的暴露分类错误。