Singh Sahezpreet, Kaur Puneet, Kaur Inderdeep, Singh Gurpreet, Kaur Satinder, Kaur Parminder
Department of Computer Science, Guru Nanak Dev University, Amritsar, India.
Department of Computer Engineering and Technology, Guru Nanak Dev University, Amritsar, India.
Sci Rep. 2025 May 3;15(1):15544. doi: 10.1038/s41598-025-00509-1.
Pesticides and other synthetic agrochemicals play a critical role in emerging agricultural practices by enhancing crop productivity and protecting against pests and diseases. However, their widespread application has raised significant concerns about environmental balance and adverse human health impacts, including neurological disorders, cancers, and respiratory and metabolic effects, particularly among agricultural workers and vulnerable populations. Extensive literature has underscored the detrimental consequences of pesticides on human health. Although, the incorporation of machine learning algorithms for accurate risk evaluation and predictive modeling still underexplored, requiring novel solutions. This study investigates the impact of synthetic agrochemicals on human health using advanced machine learning techniques, leveraging multi-level feature selection, hybrid ensemble learning, SHAP, and custom loss function to improve prediction accuracy. This study presents a robust framework for assessing the health risks posed by agrochemicals, offering novel insights into risk assessment strategies. Data sourced from credible organizations, including WHO, CDC, EPA, NHANES, and USDA, underwent extensive preprocessing and analysis. Machine learning (ML) models such as Random Forest, LightGBM, and CatBoost were employed alongside feature selection methods like mutual information gain (MI) and Recursive Feature Elimination (RFE). A custom loss function is leveraged to accurately predict the mortality cases and avoid misclassifications by penalizing the false negatives. Furthermore, Particle Swarm Optimization (PSO) and Genetic Algorithm (GA) used for model optimization. Results demonstrate the superiority of ensemble models, with LightGBM-PSO + CustomLoss achieving the highest performance with accuracy (98. 87%), precision (98.59%), recall (99.27%), F1 score (98.91%). Findings of this study can contribute in policy making and regulatory framework for public safety and health. Future directions will emphasize on multi-regional dataset as well as external validation and also real-world testing and integration with public health monitoring systems.
农药和其他合成农用化学品通过提高作物产量以及防治病虫害,在新兴农业实践中发挥着关键作用。然而,它们的广泛应用引发了人们对环境平衡和人类健康不利影响的重大担忧,包括神经紊乱、癌症以及呼吸和代谢影响,尤其是在农业工人和弱势群体中。大量文献强调了农药对人类健康的有害后果。尽管如此,将机器学习算法用于准确的风险评估和预测建模仍未得到充分探索,需要新颖的解决方案。本研究使用先进的机器学习技术,利用多层次特征选择、混合集成学习、SHAP和自定义损失函数来提高预测准确性,从而调查合成农用化学品对人类健康的影响。本研究提出了一个用于评估农用化学品所构成健康风险的稳健框架,为风险评估策略提供了新颖的见解。从包括世界卫生组织(WHO)、美国疾病控制与预防中心(CDC)、美国环境保护局(EPA)、美国国家健康与营养检查调查(NHANES)和美国农业部(USDA)等可靠组织获取的数据,经过了广泛的预处理和分析。使用了随机森林、LightGBM和CatBoost等机器学习(ML)模型以及互信息增益(MI)和递归特征消除(RFE)等特征选择方法。利用自定义损失函数准确预测死亡病例,并通过惩罚假阴性来避免错误分类。此外,还使用粒子群优化(PSO)和遗传算法(GA)进行模型优化。结果表明集成模型具有优越性,LightGBM-PSO + 自定义损失函数的模型以准确率(98.87%)、精确率(98.59%)、召回率(99.27%)、F1分数(98.91%)达到了最高性能。本研究的结果可为公共安全与健康的政策制定和监管框架做出贡献。未来的方向将强调多区域数据集以及外部验证,还有实际测试以及与公共卫生监测系统的整合。