Okati Narjes, Ebrahimi-Khusfi Zohre, Zandifar Samira, Taghizadeh-Mehrjardi Ruhollah
Department of Environment, Faculty of Natural Resources, University of Zabol, Zabol, Iran.
Department of Environmental Science and Engineering, Faculty of Natural Resources, University of Jiroft, Jiroft, Iran.
Environ Geochem Health. 2025 May 31;47(7):239. doi: 10.1007/s10653-025-02533-6.
It is necessary to predict hair mercury (Hg) levels and specify the related effective factors to develop preventive strategies to reduce Hg exposure in different regions. This study is the first effort to investigate the effectiveness of eight machine learning (ML) models (including multiple linear regression, decision tree regression, least absolute shrinkage and selection operator, multivariate adaptive regression splines, random forest, extreme gradient boosting, K-nearest neighbor, and Gaussian process) for predicting hair Hg levels and identifying the most important factors affecting them in residents of southwestern Iran. All ML models were trained with 70% of the dataset and their performance was evaluated using the determination coefficient (R), root mean square error (RMSE), and mean absolute error (MAE) based on the remaining dataset. Finally, the Permutation Feature Importance (PFI) method was used to determine the relative importance (RI) of influencing factors. Mean hair Hg (3.31 µg g⁻) was higher than the United States Environmental Protection Agency (US EPA) and World Health Organization (WHO) limits. It was indicated a high exposure risk for some people in this region. The extreme gradient boosting (XGB) model outperformed other algorithms in modeling hair Hg levels, with R = 0.61, RMSE = 2.2, and MAE = 1.25. According to the PFI analysis, weight (RI: 43.4%) and geographic place (RI: 41.8%) were found as the most important demographic factors influencing Hg variation in the study population. Additionally, occupation (RI: 46.1%) and the frequency of fish and canned fish consumption (RI: 22%) were identified as the most significant exposure factors controlling hair Hg variability in southwestern Iran. These findings can be useful for formulating appropriate strategies to reduce the health risk of Hg exposure and improve human health.
为制定不同地区减少汞暴露的预防策略,预测头发汞(Hg)水平并明确相关影响因素很有必要。本研究首次尝试探究八种机器学习(ML)模型(包括多元线性回归、决策树回归、最小绝对收缩和选择算子、多元自适应回归样条、随机森林、极端梯度提升、K近邻和高斯过程)在预测伊朗西南部居民头发汞水平及识别影响这些水平的最重要因素方面的有效性。所有ML模型均使用70%的数据进行训练,并基于剩余数据集使用决定系数(R)、均方根误差(RMSE)和平均绝对误差(MAE)评估其性能。最后,采用排列特征重要性(PFI)方法确定影响因素的相对重要性(RI)。头发汞平均含量(3.31µg g⁻)高于美国环境保护局(US EPA)和世界卫生组织(WHO)的限值。这表明该地区部分人群面临高暴露风险。极端梯度提升(XGB)模型在头发汞水平建模方面优于其他算法,R = 0.61,RMSE = 2.2,MAE = 1.25。根据PFI分析,体重(RI:43.4%)和地理位置(RI:41.8%)是影响研究人群汞变化的最重要人口统计学因素。此外,职业(RI:46.1%)以及鱼类和罐装鱼消费频率(RI:22%)被确定为控制伊朗西南部头发汞变异性的最重要暴露因素。这些发现有助于制定适当策略以降低汞暴露的健康风险并改善人类健康。