Key Laboratory of Pollution Processes and Environmental Criteria (Ministry of Education)/Tianjin Key Laboratory of Environmental Remediation and Pollution Control, College of Environmental Science and Engineering, Nankai University, Tianjin 300350, China.
Tianjin Key Laboratory of Agro-environment and Safe-product, Key Laboratory for Environmental Factors Control of Agro-product Quality Safety (Ministry of Agriculture and Rural Affairs), Institute of Agro-environmental Protection, Ministry of Agriculture and Rural Affairs, Tianjin 300191, China.
J Hazard Mater. 2022 Sep 15;438:129487. doi: 10.1016/j.jhazmat.2022.129487. Epub 2022 Jun 27.
Over the past few decades, data-driven machine learning (ML) has distinguished itself from hypothesis-driven studies and has recently received much attention in environmental toxicology. However, the use of ML in environmental toxicology remains in the early stages, with knowledge gaps, technical bottlenecks in data quality, high-dimensional/heterogeneous/small-sample data analysis and model interpretability, and a lack of an in-depth understanding of environmental toxicology. Given the above problems, we review the recent progress in the literature and highlight state-of-the-art toxicological studies using ML (such as learning and predicting toxicity in complicated biosystems and multiple-factor environmental scenarios of long-term and large-scale pollution). Beyond predicting simple biological endpoints by integrating untargeted omics and adverse outcome pathways, ML development should focus on revealing toxicological mechanisms. The integration of data-driven ML with other methods (e.g., omics analysis and adverse outcome pathway frameworks) endows ML with widely promising application in revealing toxicological mechanisms. High-quality databases and interpretable algorithms are urgently needed for toxicology and environmental science. Addressing the core issues and future challenges for ML in this review may narrow the knowledge gap between environmental toxicity and computational science and facilitate the control of environmental risk in the future.
在过去几十年中,数据驱动的机器学习(ML)已经有别于假设驱动的研究,并在最近受到环境毒理学的广泛关注。然而,ML 在环境毒理学中的应用仍处于早期阶段,存在知识空白、数据质量方面的技术瓶颈、高维/异质/小样本数据分析以及模型可解释性问题,并且对环境毒理学的理解也不够深入。鉴于上述问题,我们回顾了文献中的最新进展,并重点介绍了使用 ML 的最新毒理学研究(例如在复杂的生物系统和长期大规模污染的多因素环境场景中学习和预测毒性)。除了通过整合非靶向组学和不良结局途径来预测简单的生物学终点外,ML 的发展还应侧重于揭示毒理学机制。将数据驱动的 ML 与其他方法(例如组学分析和不良结局途径框架)相结合,为 ML 在揭示毒理学机制方面的广泛应用提供了广阔的前景。毒理学和环境科学迫切需要高质量的数据库和可解释的算法。解决本综述中 ML 的核心问题和未来挑战,可能会缩小环境毒性与计算科学之间的知识差距,并有助于未来控制环境风险。