Department of Veterinary Physiology and Pharmacology, Interdisciplinary Faculty of Toxicology, Texas A&M University, College Station, Texas 77843, United States.
Quantitative Sustainability Assessment, Department of Environmental and Resource Engineering, Technical University of Denmark, Bygningstorvet 115, 2800 Kgs. Lyngby, Denmark.
Environ Sci Technol. 2024 Sep 3;58(35):15638-15649. doi: 10.1021/acs.est.4c00172. Epub 2024 May 2.
Chemical points of departure (PODs) for critical health effects are crucial for evaluating and managing human health risks and impacts from exposure. However, PODs are unavailable for most chemicals in commerce due to a lack of toxicity data. We therefore developed a two-stage machine learning (ML) framework to predict human-equivalent PODs for oral exposure to organic chemicals based on chemical structure. Utilizing ML-based predictions for structural/physical/chemical/toxicological properties from OPERA 2.9 as features (Stage 1), ML models using random forest regression were trained with human-equivalent PODs derived from data sets for general noncancer effects ( = 1,791) and reproductive/developmental effects ( = 2,228), with robust cross-validation for feature selection and estimating generalization errors (Stage 2). These two-stage models accurately predicted PODs for both effect categories with cross-validation-based root-mean-squared errors less than an order of magnitude. We then applied one or both models to 34,046 chemicals expected to be in the environment, revealing several thousand chemicals of concern and several hundred chemicals of concern for health effects at estimated median population exposure levels. Further application can expand by orders of magnitude the coverage of organic chemicals that can be evaluated for their human health risks and impacts.
化学起始点 (POD) 对于评估和管理接触暴露对人类健康的风险和影响至关重要。然而,由于缺乏毒性数据,大多数商业用化学品都没有 POD。因此,我们开发了一个两阶段机器学习 (ML) 框架,基于化学结构预测口服暴露于有机化学品的人类等效 POD。利用 ML 基于 OPERA 2.9 的结构/物理/化学/毒理学特性预测作为特征 (第 1 阶段),使用随机森林回归的 ML 模型,根据一般非癌症效应 ( = 1,791) 和生殖/发育效应 ( = 2,228) 的数据集,利用特征选择和估计泛化误差的稳健交叉验证进行训练 (第 2 阶段)。这两个阶段的模型准确地预测了这两种效应类别的 POD,基于交叉验证的均方根误差小于一个数量级。然后,我们将其中一个或两个模型应用于预计存在于环境中的 34046 种化学物质,发现了数千种有潜在危害的化学物质和数百种对健康有潜在危害的化学物质,估计在人群的中位数暴露水平。进一步的应用可以大大扩展可以评估其对人类健康风险和影响的有机化学物质的覆盖范围。