Suppr超能文献

利用机器学习和 Tox21 识别导致毒性的蛋白质特征和途径:对预测毒理学的启示。

Identifying Protein Features and Pathways Responsible for Toxicity Using Machine Learning and Tox21: Implications for Predictive Toxicology.

机构信息

Department of Computer Science and Engineering, University at Buffalo, Buffalo, NY 14260, USA.

Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY 14260, USA.

出版信息

Molecules. 2022 May 8;27(9):3021. doi: 10.3390/molecules27093021.

Abstract

Humans are exposed to numerous compounds daily, some of which have adverse effects on health. Computational approaches for modeling toxicological data in conjunction with machine learning algorithms have gained popularity over the last few years. Machine learning approaches have been used to predict toxicity-related biological activities using chemical structure descriptors. However, toxicity-related proteomic features have not been fully investigated. In this study, we construct a computational pipeline using machine learning models for predicting the most important protein features responsible for the toxicity of compounds taken from the Tox21 dataset that is implemented within the multiscale Computational Analysis of Novel Drug Opportunities (CANDO) therapeutic discovery platform. Tox21 is a highly imbalanced dataset consisting of twelve in vitro assays, seven from the nuclear receptor (NR) signaling pathway and five from the stress response (SR) pathway, for more than 10,000 compounds. For the machine learning model, we employed a random forest with the combination of Synthetic Minority Oversampling Technique (SMOTE) and the Edited Nearest Neighbor (ENN) method (SMOTE+ENN), which is a resampling method to balance the activity class distribution. Within the NR and SR pathways, the activity of the aryl hydrocarbon receptor (NR-AhR) and the mitochondrial membrane potential (SR-MMP) were two of the top-performing twelve toxicity endpoints with AUCROCs of 0.90 and 0.92, respectively. The top extracted features for evaluating compound toxicity were analyzed for enrichment to highlight the implicated biological pathways and proteins. We validated our enrichment results for the activity of the AhR using a thorough literature search. Our case study showed that the selected enriched pathways and proteins from our computational pipeline are not only correlated with AhR toxicity but also form a cascading upstream/downstream arrangement. Our work elucidates significant relationships between protein and compound interactions computed using CANDO and the associated biological pathways to which the proteins belong for twelve toxicity endpoints. This novel study uses machine learning not only to predict and understand toxicity but also elucidates therapeutic mechanisms at a proteomic level for a variety of toxicity endpoints.

摘要

人类每天都会接触到许多化合物,其中一些对健康有不良影响。近年来,结合机器学习算法对毒理学数据进行建模的计算方法越来越受欢迎。机器学习方法已被用于使用化学结构描述符预测与毒性相关的生物活性。然而,与毒性相关的蛋白质组学特征尚未得到充分研究。在这项研究中,我们构建了一个使用机器学习模型预测负责从 Tox21 数据集中化合物毒性的最重要蛋白质特征的计算管道,该数据集是在多尺度新型药物机会计算分析(CANDO)治疗发现平台内实施的。Tox21 是一个高度不平衡的数据集,包含 12 种体外测定,其中 7 种来自核受体(NR)信号通路,5 种来自应激反应(SR)通路,超过 10000 种化合物。对于机器学习模型,我们采用了随机森林与合成少数过采样技术(SMOTE)和编辑最近邻(ENN)方法的组合(SMOTE+ENN),这是一种用于平衡活动类分布的重采样方法。在 NR 和 SR 通路中,芳香烃受体(NR-AhR)和线粒体膜电位(SR-MMP)的活性是表现最好的十二种毒性终点中的两种,AUCROCs 分别为 0.90 和 0.92。用于评估化合物毒性的顶级提取特征进行了富集分析,以突出所涉及的生物途径和蛋白质。我们使用彻底的文献搜索验证了我们对 AhR 活性的富集结果。我们的案例研究表明,我们计算管道中选择的富集途径和蛋白质不仅与 AhR 毒性相关,而且还形成了一个级联的上游/下游排列。我们的工作阐明了使用 CANDO 计算的蛋白质与化合物相互作用与蛋白质所属的相关生物途径之间的重要关系,用于 12 种毒性终点。这项新研究不仅使用机器学习来预测和理解毒性,还阐明了各种毒性终点的蛋白质组学水平上的治疗机制。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e387/9099959/b18ef3b46a0c/molecules-27-03021-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验