Shanghai Municipal Center for Disease Control and Prevention, Shanghai, China.
Department of Environmental Health, Key Laboratory of the Public Health Safety, Ministry of Education, School of Public Health, Fudan University, Shanghai, China.
Front Public Health. 2023 Aug 10;11:1213453. doi: 10.3389/fpubh.2023.1213453. eCollection 2023.
People usually spend most of their time indoors, so indoor fine particulate matter (PM) concentrations are crucial for refining individual PM exposure evaluation. The development of indoor PM concentration prediction models is essential for the health risk assessment of PM in epidemiological studies involving large populations.
In this study, based on the monitoring data of multiple types of places, the classical multiple linear regression (MLR) method and random forest regression (RFR) algorithm of machine learning were used to develop hourly average indoor PM concentration prediction models. Indoor PM concentration data, which included 11,712 records from five types of places, were obtained by on-site monitoring. Moreover, the potential predictor variable data were derived from outdoor monitoring stations and meteorological databases. A ten-fold cross-validation was conducted to examine the performance of all proposed models.
The final predictor variables incorporated in the MLR model were outdoor PM concentration, type of place, season, wind direction, surface wind speed, hour, precipitation, air pressure, and relative humidity. The ten-fold cross-validation results indicated that both models constructed had good predictive performance, with the determination coefficients (R) of RFR and MLR were 72.20 and 60.35%, respectively. Generally, the RFR model had better predictive performance than the MLR model (RFR model developed using the same predictor variables as the MLR model, R = 71.86%). In terms of predictors, the importance results of predictor variables for both types of models suggested that outdoor PM concentration, type of place, season, hour, wind direction, and surface wind speed were the most important predictor variables.
In this research, hourly average indoor PM concentration prediction models based on multiple types of places were developed for the first time. Both the MLR and RFR models based on easily accessible indicators displayed promising predictive performance, in which the machine learning domain RFR model outperformed the classical MLR model, and this result suggests the potential application of RFR algorithms for indoor air pollutant concentration prediction.
人们通常在室内度过大部分时间,因此室内细颗粒物 (PM) 浓度对于细化个体 PM 暴露评估至关重要。开发室内 PM 浓度预测模型对于涉及大量人群的流行病学研究中 PM 健康风险评估至关重要。
本研究基于多种场所的监测数据,采用经典多元线性回归(MLR)方法和机器学习的随机森林回归(RFR)算法,建立了每小时平均室内 PM 浓度预测模型。室内 PM 浓度数据来自五个类型的场所,通过现场监测获得了 11712 个记录。此外,潜在预测变量数据来自室外监测站和气象数据库。采用十折交叉验证来检验所有提出模型的性能。
最终纳入 MLR 模型的预测变量包括室外 PM 浓度、场所类型、季节、风向、地表风速、小时、降水、气压和相对湿度。十折交叉验证结果表明,所构建的两种模型均具有良好的预测性能,RFR 和 MLR 的决定系数(R)分别为 72.20%和 60.35%。总体而言,RFR 模型的预测性能优于 MLR 模型(使用与 MLR 模型相同的预测变量开发的 RFR 模型,R=71.86%)。就预测变量而言,两种模型的预测变量重要性结果表明,室外 PM 浓度、场所类型、季节、小时、风向和地表风速是最重要的预测变量。
本研究首次建立了基于多种场所的每小时平均室内 PM 浓度预测模型。基于易于获取指标的 MLR 和 RFR 模型均显示出有前景的预测性能,其中机器学习领域的 RFR 模型优于经典的 MLR 模型,这一结果表明 RFR 算法在室内空气污染物浓度预测方面具有潜在的应用。