Pradeep Prachi, Friedman Katie Paul, Judson Richard
Oak Ridge Institute for Science and Education, Oak Ridge, Tennessee.
Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina.
Comput Toxicol. 2020 Nov 1;16(November 2020). doi: 10.1016/j.comtox.2020.100139.
Human health risk assessment for environmental chemical exposure is limited by a vast majority of chemicals with little or no experimental toxicity data. Data gap filling techniques, such as quantitative structure activity relationship (QSAR) models based on chemical structure information, can predict hazard in the absence of experimental data. Risk assessment requires identification of a quantitative point-of-departure (POD) value, the point on the dose-response curve that marks the beginning of a low-dose extrapolation. This study presents two sets of QSAR models to predict POD values (POD) for repeat dose toxicity. For training and validation, a publicly available toxicity dataset for 3592 chemicals was compiled using the U.S. Environmental Protection Agency's Toxicity Value database (ToxValDB). The first set of QSAR models predict point-estimates of POD values (POD) using structural and physicochemical descriptors for repeat dose study types and species combinations. A random forest QSAR model using study type and species as descriptors showed the best performance, with an external test set root mean square error (RMSE) of 0.71 log-mg/kg/day and coefficient of determination (R) of 0.53. The second set of QSAR models predict the 95% confidence intervals for POD using a constructed POD distribution with a mean equal to the median POD value and a standard deviation of 0.5 log-mg/kg/day, based on previously published typical study-to-study variability that may lead to uncertainty in model predictions. Bootstrap resampling of the pre-generated POD distribution was used to derive point-estimates and 95% confidence intervals for each POD prediction. Enrichment analysis to evaluate the accuracy of POD showed that 80% of the 5% most potent chemicals were found in the top 20% of the most potent chemical predictions, suggesting that the repeat dose POD QSAR models presented here may help inform screening level human health risk assessments in the absence of other data.
环境化学物质暴露的人体健康风险评估受到绝大多数缺乏或几乎没有实验毒性数据的化学物质的限制。数据缺口填补技术,如基于化学结构信息的定量构效关系(QSAR)模型,可以在没有实验数据的情况下预测危害。风险评估需要确定一个定量的起始点(POD)值,即剂量反应曲线上标志低剂量外推开始的点。本研究提出了两组QSAR模型来预测重复剂量毒性的POD值(POD)。为了进行训练和验证,使用美国环境保护局的毒性值数据库(ToxValDB)编制了一个包含3592种化学物质的公开可用毒性数据集。第一组QSAR模型使用重复剂量研究类型和物种组合的结构和物理化学描述符来预测POD值(POD)的点估计。使用研究类型和物种作为描述符的随机森林QSAR模型表现最佳,外部测试集的均方根误差(RMSE)为0.71 log-毫克/千克/天,决定系数(R)为0.53。第二组QSAR模型基于先前发表的可能导致模型预测不确定性的典型研究间变异性,使用构建的POD分布预测POD的95%置信区间,该分布的均值等于POD值的中位数,标准差为0.5 log-毫克/千克/天。对预先生成的POD分布进行自助重采样,以得出每个POD预测的点估计和95%置信区间。评估POD准确性的富集分析表明,5%最具毒性的化学物质中有80%出现在最具毒性化学物质预测的前20%中,这表明此处提出的重复剂量POD QSAR模型可能有助于在缺乏其他数据的情况下为筛选水平的人体健康风险评估提供信息。