National Center for Computational Toxicology, Office of Research and Development, US Environmental Protection Agency, Research Triangle Park, NC, 27711, USA.
Currently at Oregon State University, Corvallis, USA.
Arch Toxicol. 2018 Feb;92(2):587-600. doi: 10.1007/s00204-017-2067-x. Epub 2017 Oct 27.
In an effort to address a major challenge in chemical safety assessment, alternative approaches for characterizing systemic effect levels, a predictive model was developed. Systemic effect levels were curated from ToxRefDB, HESS-DB and COSMOS-DB from numerous study types totaling 4379 in vivo studies for 1247 chemicals. Observed systemic effects in mammalian models are a complex function of chemical dynamics, kinetics, and inter- and intra-individual variability. To address this complex problem, systemic effect levels were modeled at the study-level by leveraging study covariates (e.g., study type, strain, administration route) in addition to multiple descriptor sets, including chemical (ToxPrint, PaDEL, and Physchem), biological (ToxCast), and kinetic descriptors. Using random forest modeling with cross-validation and external validation procedures, study-level covariates alone accounted for approximately 15% of the variance reducing the root mean squared error (RMSE) from 0.96 log to 0.85 log mg/kg/day, providing a baseline performance metric (lower expectation of model performance). A consensus model developed using a combination of study-level covariates, chemical, biological, and kinetic descriptors explained a total of 43% of the variance with an RMSE of 0.69 log mg/kg/day. A benchmark model (upper expectation of model performance) was also developed with an RMSE of 0.5 log mg/kg/day by incorporating study-level covariates and the mean effect level per chemical. To achieve a representative chemical-level prediction, the minimum study-level predicted and observed effect level per chemical were compared reducing the RMSE from 1.0 to 0.73 log mg/kg/day, equivalent to 87% of predictions falling within an order-of-magnitude of the observed value. Although biological descriptors did not improve model performance, the final model was enriched for biological descriptors that indicated xenobiotic metabolism gene expression, oxidative stress, and cytotoxicity, demonstrating the importance of accounting for kinetics and non-specific bioactivity in predicting systemic effect levels. Herein, we generated an externally predictive model of systemic effect levels for use as a safety assessment tool and have generated forward predictions for over 30,000 chemicals.
为了解决化学安全评估中的一个主要挑战,即表征系统效应水平的替代方法,开发了一种预测模型。系统效应水平是从 ToxRefDB、HESS-DB 和 COSMOS-DB 中提取的,这些数据库包含了来自多种研究类型的 4379 项体内研究,涉及 1247 种化学物质。哺乳动物模型中观察到的系统效应是化学动力学、代谢动力学以及个体间和个体内变异性的复杂函数。为了解决这个复杂的问题,通过利用研究协变量(例如研究类型、品系、给药途径)以及多个描述符集(包括化学(ToxPrint、PaDEL 和 Physchem)、生物学(ToxCast)和代谢动力学描述符),在研究水平上对系统效应水平进行建模。使用随机森林建模方法进行交叉验证和外部验证程序,仅研究水平协变量就解释了大约 15%的方差,将均方根误差(RMSE)从 0.96 降低到 0.85 log mg/kg/天,提供了一个基线性能指标(对模型性能的期望较低)。使用研究水平协变量、化学、生物学和代谢动力学描述符的组合开发的共识模型总共解释了 43%的方差,RMSE 为 0.69 log mg/kg/天。还通过整合研究水平协变量和每种化学物质的平均效应水平,开发了一个基准模型(对模型性能的期望较高),RMSE 为 0.5 log mg/kg/天。通过比较每种化学物质的最低研究水平预测和观察到的效应水平,实现了具有代表性的化学水平预测,将 RMSE 从 1.0 降低到 0.73 log mg/kg/天,相当于 87%的预测值落在观察值的一个数量级内。尽管生物学描述符没有提高模型性能,但最终模型中富集了指示外源性代谢基因表达、氧化应激和细胞毒性的生物学描述符,这表明在预测系统效应水平时,考虑代谢动力学和非特异性生物活性非常重要。在这里,我们生成了一个可用于安全评估工具的系统效应水平的外部预测模型,并对超过 30000 种化学物质进行了前瞻性预测。