Suppr超能文献

利用来自全球化学品统一分类和标签制度(GHS)毒性注释、分子和蛋白质靶标描述符以及Tox21检测读数的异构数据来预测和合理化急性毒性。

Leveraging heterogeneous data from GHS toxicity annotations, molecular and protein target descriptors and Tox21 assay readouts to predict and rationalise acute toxicity.

作者信息

Allen Chad H G, Mervin Lewis H, Mahmoud Samar Y, Bender Andreas

机构信息

Department of Chemistry, Centre for Molecular Informatics, Lensfield Road, Cambridge, CB2 1EW, UK.

出版信息

J Cheminform. 2019 May 31;11(1):36. doi: 10.1186/s13321-019-0356-5.

Abstract

Despite the increasing knowledge in both the chemical and biological domains the assimilation and exploration of heterogeneous datasets, encoding information about the chemical, bioactivity and phenotypic properties of compounds, remains a challenge due to requirement for overlap between chemicals assayed across the spaces. Here, we have constructed a novel dataset, larger than we have used in prior work, comprising 579 acute oral toxic compounds and 1427 non-toxic compounds derived from regulatory GHS information, along with their corresponding molecular and protein target descriptors and qHTS in vitro assay readouts from the Tox21 project. We found no clear association between the results of a FAFDrugs4 toxicophore screen and the acute oral toxicity classifications for our compound set; and a screen using a subset of the ToxAlerts toxicophores was also of limited utility, with only slight enrichment toward the toxic set (odds ratio of 1.48). We then investigated to what degree toxic and non-toxic compounds could be separated in each of the spaces, to compare their potential contribution to further analyses. Using an LDA projection, we found the largest degree of separation using chemical descriptors (Cohen's d of 1.95) and the lowest degree of separation between toxicity classes using qHTS descriptors (Cohen's d of 0.67). To compare the predictivity of the feature spaces for the toxicity endpoint, we next trained Random Forest (RF) acute oral toxicity classifiers on either molecular, protein target and qHTS descriptors. RFs trained on molecular and protein target descriptors were most predictive, with ROC AUC values of 0.80-0.92 and 0.70-0.85, respectively, across three test sets. RFs trained on both chemical and protein target descriptors combined exhibited similar predictive performance to the single-domain models (ROC AUC of 0.80-0.91). Model interpretability was improved by the inclusion of protein target descriptors, which allow the identification of specific targets (e.g. Retinal dehydrogenase) with literature links to toxic modes of action (e.g. oxidative stress). The dataset compiled in this study has been made available for future application.

摘要

尽管在化学和生物领域的知识不断增加,但由于跨空间检测的化学物质之间需要重叠,对异构数据集的同化和探索(编码有关化合物的化学、生物活性和表型特性的信息)仍然是一项挑战。在这里,我们构建了一个比我们之前工作中使用的数据集更大的新数据集,该数据集包含579种急性口服毒性化合物和1427种从监管GHS信息中获得的无毒化合物,以及它们相应的分子和蛋白质靶点描述符,以及来自Tox21项目的qHTS体外测定读数。我们发现FAFDrugs4毒性基团筛选结果与我们化合物集的急性口服毒性分类之间没有明显关联;使用ToxAlerts毒性基团子集进行的筛选效用也有限,仅对毒性组有轻微富集(优势比为1.48)。然后,我们研究了在每个空间中有毒和无毒化合物可以分离到何种程度,以比较它们对进一步分析的潜在贡献。使用线性判别分析(LDA)投影,我们发现使用化学描述符时分离程度最大(科恩d值为1.95),使用qHTS描述符时毒性类别之间的分离程度最低(科恩d值为0.67)。为了比较特征空间对毒性终点的预测能力,接下来我们在分子、蛋白质靶点和qHTS描述符上训练了随机森林(RF)急性口服毒性分类器。在分子和蛋白质靶点描述符上训练的随机森林最具预测性,在三个测试集上的ROC曲线下面积(AUC)值分别为0.80 - 0.92和0.70 - 0.85。在化学和蛋白质靶点描述符组合上训练的随机森林表现出与单域模型相似的预测性能(ROC AUC为0.80 - 0.91)。通过纳入蛋白质靶点描述符提高了模型的可解释性,这使得能够识别与毒性作用模式(如氧化应激)有文献联系的特定靶点(如视网膜脱氢酶)。本研究汇编的数据集已可供未来应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c563/6544914/c1ba696e22d6/13321_2019_356_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验