Division of Pre-clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, Maryland 20850, United States.
State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center of Eco-Environment Sciences, Chinese Academy of Sciences, Beijing 100085, China.
Chem Res Toxicol. 2020 Mar 16;33(3):731-741. doi: 10.1021/acs.chemrestox.9b00305. Epub 2020 Mar 3.
Traditional toxicity testing reliant on animal models is costly and low throughput, posing a significant challenge with the increasing numbers of chemicals that humans are exposed to in the environment. The purpose of this investigation was to build optimal prediction models for various human /organ-level toxicity end points (extracted from ChemIDPlus) using chemical structure and Tox21 quantitative high-throughput screening (qHTS) bioactivity assay data. Several supervised machine learning algorithms were applied to model 14 human toxicity end points pertaining to vascular, kidney, ureter and bladder, and liver organ systems. Three metrics were used to evaluate model performance: area under the receiver operating characteristic curve (AUC-ROC), balanced accuracy (BA), and Matthews correlation coefficient (MCC). The top four models, with AUC-ROC values >0.8, were derived for endocrine (0.90 ± 0.00), musculoskeletal (0.88 ± 0.02), peripheral nerve and sensation (0.85 ± 0.01), and brain and coverings (0.83 ± 0.02) toxicities, whereas the best model AUC-ROC values were >0.7 for the remaining 10 toxicities. Model performance was found to be dependent on the specific data set, model type, and feature selection method used. In addition, chemical structure and assay data showed different levels of contribution to the prediction of different toxicity end points. Although assay data, when combined with chemical structure, slightly improved the predictive accuracy for most end points (11 out of 14), a noteworthy finding was the near equal success of the structure-only models, which do not require Tox21 qHTS screening data, and the relatively poor performance of assay-only models. Thus, the top-performing structure-only models from this study could be applied for hazard screening of large sets of chemicals for potential human toxicity, whereas the largest assay contributions to models (i.e., cellular targets) could be used, along with the top-contributing structural features, to provide insight into toxicity mechanisms.
传统的基于动物模型的毒性测试成本高、通量低,这对于人类在环境中接触到的越来越多的化学物质构成了重大挑战。本研究的目的是使用化学结构和 Tox21 定量高通量筛选 (qHTS) 生物活性测定数据,为各种人类/器官水平毒性终点(从 ChemIDPlus 中提取)构建最佳预测模型。应用了几种监督机器学习算法来模拟 14 个人类毒性终点,这些终点涉及血管、肾脏、输尿管和膀胱以及肝脏器官系统。使用三个指标来评估模型性能:接收者操作特征曲线下的面积 (AUC-ROC)、平衡准确性 (BA) 和马修斯相关系数 (MCC)。对于内分泌毒性 (0.90 ± 0.00)、肌肉骨骼毒性 (0.88 ± 0.02)、周围神经和感觉毒性 (0.85 ± 0.01) 和大脑和覆盖物毒性 (0.83 ± 0.02),前四个模型的 AUC-ROC 值 >0.8,而对于其余 10 种毒性,最佳模型的 AUC-ROC 值 >0.7。模型性能取决于使用的特定数据集、模型类型和特征选择方法。此外,化学结构和测定数据对不同毒性终点的预测贡献程度不同。尽管测定数据与化学结构结合使用略微提高了大多数终点(14 个中的 11 个)的预测准确性,但值得注意的是,仅结构模型的成功程度相当高,这些模型不需要 Tox21 qHTS 筛选数据,而仅测定模型的性能相对较差。因此,本研究中表现最佳的仅结构模型可用于对大量潜在人类毒性的化学物质进行危害筛选,而模型中测定数据的最大贡献(即细胞靶标)可与贡献最大的结构特征一起用于深入了解毒性机制。