Suppr超能文献

CYP 名人:用于预测细胞色素 P450 酶抑制剂的机器学习模型。

CYPlebrity: Machine learning models for the prediction of inhibitors of cytochrome P450 enzymes.

机构信息

Universität Hamburg, Center for Bioinformatics (ZBH), Hamburg, Bundesstr. 43, 20146, Germany; FQS Poland (Fujitsu Group), Parkowa 11, 30-538 Cracow, Poland.

Universität Hamburg, Center for Bioinformatics (ZBH), Hamburg, Bundesstr. 43, 20146, Germany.

出版信息

Bioorg Med Chem. 2021 Sep 15;46:116388. doi: 10.1016/j.bmc.2021.116388. Epub 2021 Aug 28.

Abstract

The vast majority of approved drugs are metabolized by the five major cytochrome P450 (CYP) isozymes, 1A2, 2C9, 2C19, 2D6 and 3A4. Inhibition of CYP isozymes can cause drug-drug interactions with severe pharmacological and toxicological consequences. Computational methods for the fast and reliable prediction of the inhibition of CYP isozymes by small molecules are therefore of high interest and relevance to pharmaceutical companies and a host of other industries, including the cosmetics and agrochemical industries. Today, a large number of machine learning models for predicting the inhibition of the major CYP isozymes by small molecules are available. With this work we aim to go beyond the coverage of existing models, by combining data from several major public and proprietary sources. More specifically, we used up to 18815 compounds with measured bioactivities to train random forest classification models for the individual CYP isozymes. A major advantage of the new data collection over existing ones is the better representation of the minority class, the CYP inhibitors. With the new data collection we achieved inhibitor-to-non-inhibitor ratios in the order of 1:1 (CYP1A2) to 1:3 (CYP2D6). We show that our models reach competitive performance on external data, with Matthews correlation coefficients (MCCs) ranging from 0.62 (CYP2C19) to 0.70 (CYP2D6), and areas under the receiver operating characteristic curve (AUCs) between 0.89 (CYP2C19) and 0.92 (CYPs 2D6 and 3A4). Importantly, the models show a high level of robustness, reflected in a good predictivity also for compounds that are structurally dissimilar to the compounds represented in the training data. The best models presented in this work are freely accessible for academic research via a web service.

摘要

绝大多数已批准的药物都是由五种主要的细胞色素 P450(CYP)同工酶 1A2、2C9、2C19、2D6 和 3A4 代谢的。CYP 同工酶的抑制作用会导致药物相互作用,产生严重的药理学和毒理学后果。因此,开发用于快速可靠地预测小分子对 CYP 同工酶抑制作用的计算方法,对于制药公司和许多其他行业(包括化妆品和农化行业)具有重要意义。目前,已有大量用于预测小分子对主要 CYP 同工酶抑制作用的机器学习模型。通过这项工作,我们旨在通过结合来自几个主要公共和专有来源的数据,超越现有模型的覆盖范围。具体来说,我们使用了多达 18815 种具有测量生物活性的化合物来训练用于个体 CYP 同工酶的随机森林分类模型。与现有数据相比,新数据集的一个主要优势是对少数类(CYP 抑制剂)的更好表示。通过新数据集,我们实现了抑制剂与非抑制剂的比例为 1:1(CYP1A2)到 1:3(CYP2D6)。我们表明,我们的模型在外部数据上达到了有竞争力的性能,Matthews 相关系数(MCC)范围从 0.62(CYP2C19)到 0.70(CYP2D6),接收者操作特征曲线(ROC)下的面积(AUC)在 0.89(CYP2C19)到 0.92(CYP2D6 和 3A4)之间。重要的是,模型表现出很高的稳健性,这反映在对结构上与训练数据中代表的化合物不相似的化合物也具有良好的预测能力。本文中提出的最佳模型可通过网络服务免费供学术研究使用。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验