Suppr超能文献

PTML 组合模型分析多个类型癌症的 ChEMBL 化合物检测结果。

PTML Combinatorial Model of ChEMBL Compounds Assays for Multiple Types of Cancer.

机构信息

Department of Organic Chemistry II , University of Basque Country UPV/EHU , 48940 , Leioa , Spain.

IKERBASQUE, Basque Foundation for Science , 48011 , Bilbao , Spain.

出版信息

ACS Comb Sci. 2018 Nov 12;20(11):621-632. doi: 10.1021/acscombsci.8b00090. Epub 2018 Oct 3.

Abstract

Determining the target proteins of new anticancer compounds is a very important task in Medicinal Chemistry. In this sense, chemists carry out preclinical assays with a high number of combinations of experimental conditions (c ). In fact, ChEMBL database contains outcomes of 65 534 different anticancer activity preclinical assays for 35 565 different chemical compounds (1.84 assays per compound). These assays cover different combinations of c formed from >70 different biological activity parameters ( c), >300 different drug targets ( c), >230 cell lines ( c), and 5 organisms of assay ( c) or organisms of the target ( c). It include a total of 45 833 assays in leukemia, 6227 assays in breast cancer, 2499 assays in ovarian cancer, 3499 in colon cancer, 3159 in lung cancer, 2750 in prostate cancer, 601 in melanoma, etc. This is a very complex data set with multiple Big Data features. This data is hard to be rationalized by researchers to extract useful relationships and predict new compounds. In this context, we propose to combine perturbation theory (PT) ideas and machine learning (ML) modeling to solve this combinatorial-like problem. In this work, we report a PTML (PT + ML) model for ChEMBL data set of preclinical assays of anticancer compounds. This is a simple linear model with only three variables. The model presented values of area under receiver operating curve = AUROC = 0.872, specificity = Sp(%) = 90.2, sensitivity = Sn(%) = 70.6, and overall accuracy = Ac(%) = 87.7 in training series. The model also have Sp(%) = 90.1, Sn(%) = 71.4, and Ac(%) = 87.8 in external validation series. The model use PT operators based on multicondition moving averages to capture all the complexity of the data set. We also compared the model with nonlinear artificial neural network (ANN) models obtaining similar results. This confirms the hypothesis of a linear relationship between the PT operators and the classification as anticancer compounds in different combinations of assay conditions. Last, we compared the model with other PTML models reported in the literature concluding that this is the only one PTML model able to predict activity against multiple types of cancer. This model is a simple but versatile tool for the prediction of the targets of anticancer compounds taking into consideration multiple combinations of experimental conditions in preclinical assays.

摘要

确定新型抗癌化合物的靶蛋白是药物化学中非常重要的任务。在这方面,化学家进行了大量的预临床实验,实验条件组合数量众多(c)。事实上,ChEMBL 数据库包含了针对 35565 种不同化学化合物的 65534 种不同抗癌活性预临床实验的结果(每种化合物 1.84 个实验)。这些实验涵盖了由>70 种不同生物活性参数(c)、>300 种不同药物靶点(c)、>230 种细胞系(c)和 5 种实验生物(c)或目标生物(c)组成的不同组合。它总共包括 45833 种白血病实验、6227 种乳腺癌实验、2499 种卵巢癌实验、3499 种结肠癌实验、3159 种肺癌实验、2750 种前列腺癌实验、601 种黑色素瘤实验等。这是一个非常复杂的数据集,具有多个大数据特征。研究人员很难将这些数据合理化,以提取有用的关系并预测新的化合物。在这种情况下,我们提出将扰动理论(PT)思想和机器学习(ML)建模相结合,以解决这种组合问题。在这项工作中,我们报告了一个用于 ChEMBL 抗癌化合物预临床实验数据集的 PTML(PT+ML)模型。这是一个简单的线性模型,只有三个变量。该模型的接收器操作曲线下面积(AUROC)值为 0.872,特异性(Sp%)为 90.2,敏感性(Sn%)为 70.6,整体准确率(Ac%)为 87.7,在训练系列中。该模型在外部验证系列中的特异性(Sp%)为 90.1、敏感性(Sn%)为 71.4 和准确率(Ac%)为 87.8。该模型使用基于多条件移动平均值的 PT 算子来捕捉数据集的所有复杂性。我们还将该模型与非线性人工神经网络(ANN)模型进行了比较,得到了类似的结果。这证实了 PT 算子与作为不同实验条件组合的抗癌化合物的分类之间存在线性关系的假设。最后,我们将该模型与文献中报道的其他 PTML 模型进行了比较,得出的结论是,这是唯一能够预测对多种类型癌症具有活性的 PTML 模型。该模型是一种简单但多功能的工具,可用于预测抗癌化合物的靶标,同时考虑预临床实验中多种实验条件的组合。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验