Suppr超能文献

HExpPredict:基于随机森林模型的人类血液暴露组学暴露预测及其在化学风险优先排序中的应用。

HExpPredict: Exposure Prediction of Human Blood Exposome Using a Random Forest Model and Its Application in Chemical Risk Prioritization.

机构信息

Department of Environmental Science and Engineering, Fudan University, Shanghai, P.R. China.

Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore.

出版信息

Environ Health Perspect. 2023 Mar;131(3):37009. doi: 10.1289/EHP11305. Epub 2023 Mar 13.

Abstract

BACKGROUND

Due to many substances in the human exposome, there is a dearth of exposure and toxicity information available to assess potential health risks. Quantification of all trace organics in the biological fluids seems impossible and costly, regardless of the high individual exposure variability. We hypothesized that the blood concentration () of organic pollutants could be predicted via their exposure and chemical properties. Developing a prediction model on the annotation of chemicals in human blood can provide new insight into the distribution and extent of exposures to a wide range of chemicals in humans.

OBJECTIVES

Our objective was to develop a machine learning (ML) model to predict blood concentrations () of chemicals and prioritize chemicals of health concern.

METHODS

We curated the of compounds mostly measured at population levels and developed an ML model for chemical predictions by considering chemical daily exposure (DE) and exposure pathway indicators (), half-lives (), and volume of distribution (). Three ML models, including random forest (RF), artificial neural network (ANN) and support vector regression (SVR) were compared. The toxicity potential or prioritization of each chemical was represented as a bioanalytical equivalency (BEQ) and its percentage (BEQ%) estimated based on the predicted and ToxCast bioactivity data. We also retrieved the top 25 most active chemicals in each assay to further observe changes in the BEQ% after the exclusion of the drugs and endogenous substances.

RESULTS

We curated the of 216 compounds primarily measured at population levels. RF outperformed the ANN and SVF models with the root mean square error (RMSE) of 1.66 and , the mean absolute error (MAE) values of 1.28 and , the mean absolute percentage error (MAPE) of 0.29 and 0.23, and of 0.80 and 0.72 across test and testing sets. Subsequently, the human of 7,858 ToxCast chemicals were successfully predicted, ranging from to . The predicted were then combined with ToxCast bioassays to prioritize the ToxCast chemicals across 12 assays with important toxicological end points. It is interesting that we found the most active compounds to be food additives and pesticides rather than widely monitored environmental pollutants.

DISCUSSION

We have shown that the accurate prediction of "internal exposure" from "external exposure" is possible, and this result can be quite useful in the risk prioritization. https://doi.org/10.1289/EHP11305.

摘要

背景

由于人体暴露组中有许多物质,因此可用的暴露和毒性信息匮乏,难以评估潜在的健康风险。无论个体暴露的变异性有多高,量化生物体液中的所有痕量有机物似乎都是不可能且昂贵的。我们假设可以通过污染物的暴露和化学特性来预测其血液浓度()。在人体血液中对化学物质进行注释的预测模型可以为了解人体中广泛存在的各种化学物质的分布和程度提供新的视角。

目的

我们的目标是开发一种机器学习(ML)模型来预测化学物质的血液浓度()并确定健康相关的优先化学物质。

方法

我们对主要在人群水平上测量的化合物进行了编目,并通过考虑化学日暴露(DE)和暴露途径指标()、半衰期()和分布体积(),开发了一种用于化学预测的 ML 模型。比较了三种 ML 模型,包括随机森林(RF)、人工神经网络(ANN)和支持向量回归(SVR)。每个化学物质的毒性潜力或优先级表示为生物分析等效物(BEQ),并根据预测的和 ToxCast 生物活性数据估计其百分比(BEQ%)。我们还检索了每个测定中最活跃的前 25 种化学物质,以进一步观察在排除药物和内源性物质后 BEQ%的变化。

结果

我们对主要在人群水平上测量的 216 种化合物进行了编目。RF 模型的均方根误差(RMSE)为 1.66 和 ,平均绝对误差(MAE)值为 1.28 和 ,平均绝对百分比误差(MAPE)为 0.29 和 0.23,和为 0.80 和 0.72,在测试和测试集上均优于 ANN 和 SVR 模型。随后,成功预测了 7858 种 ToxCast 化学物质的人体,范围从到。预测的与 ToxCast 生物测定相结合,对 12 种具有重要毒理学终点的 ToxCast 化学物质进行了优先排序。有趣的是,我们发现最活跃的化合物是食品添加剂和农药,而不是广泛监测的环境污染物。

讨论

我们已经表明,从“外部暴露”准确预测“内部暴露”是可能的,这一结果在风险优先级排序中非常有用。https://doi.org/10.1289/EHP11305.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/164d/10010393/7e4d5c105399/ehp11305_f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验