DIBAF Department, University of Tuscia, 01100 Viterbo, Italy.
School of Engineering, University of Basilicata, Viale dell'Ateneo Lucano 10, 85100 Potenza, Italy.
Int J Environ Res Public Health. 2024 Jul 2;21(7):867. doi: 10.3390/ijerph21070867.
Several studies suggest that environmental and climatic factors are linked to the risk of mortality due to cardiovascular and respiratory diseases; however, it is still unclear which are the most influential ones. This study sheds light on the potentiality of a data-driven statistical approach by providing a case study analysis.
Daily admissions to the emergency room for cardiovascular and respiratory diseases are jointly analyzed with daily environmental and climatic parameter values (temperature, atmospheric pressure, relative humidity, carbon monoxide, ozone, particulate matter, and nitrogen dioxide). The Random Forest (RF) model and feature importance measure (FMI) techniques (permutation feature importance (PFI), Shapley Additive exPlanations (SHAP) feature importance, and the derivative-based importance measure (κALE)) are applied for discriminating the role of each environmental and climatic parameter. Data are pre-processed to remove trend and seasonal behavior using the Seasonal Trend Decomposition (STL) method and preliminary analyzed to avoid redundancy of information.
The RF performance is encouraging, being able to predict cardiovascular and respiratory disease admissions with a mean absolute relative error of 0.04 and 0.05 cases per day, respectively. Feature importance measures discriminate parameter behaviors providing importance rankings. Indeed, only three parameters (temperature, atmospheric pressure, and carbon monoxide) were responsible for most of the total prediction accuracy.
Data-driven and statistical tools, like the feature importance measure, are promising for discriminating the role of environmental and climatic factors in predicting the risk related to cardiovascular and respiratory diseases. Our results reveal the potential of employing these tools in public health policy applications for the development of early warning systems that address health risks associated with climate change, and improving disease prevention strategies.
多项研究表明,环境和气候因素与心血管和呼吸系统疾病导致的死亡率有关;然而,哪些因素影响最大仍不清楚。本研究通过案例分析,为数据驱动的统计方法提供了一些启示。
对心血管和呼吸系统疾病的急诊每日入院人数与每日环境和气候参数值(温度、大气压、相对湿度、一氧化碳、臭氧、颗粒物和二氧化氮)进行联合分析。随机森林(RF)模型和特征重要性度量(FMI)技术(排列特征重要性(PFI)、Shapley 可加解释(SHAP)特征重要性和基于导数的重要性度量(κALE))用于区分每个环境和气候参数的作用。使用季节性趋势分解(STL)方法对数据进行预处理以去除趋势和季节性行为,并进行初步分析以避免信息冗余。
RF 的性能令人鼓舞,能够分别以 0.04 和 0.05 例/天的平均绝对相对误差预测心血管和呼吸系统疾病的入院人数。特征重要性度量可区分参数行为,提供重要性排名。事实上,只有三个参数(温度、大气压和一氧化碳)对总预测精度的贡献最大。
数据驱动和统计工具,如特征重要性度量,在区分环境和气候因素在预测与心血管和呼吸系统疾病相关的风险方面具有很大的潜力。我们的结果表明,这些工具可用于公共卫生政策应用,以开发应对气候变化相关健康风险的早期预警系统,并改进疾病预防策略。