Suppr超能文献

基于定量构效关系的亲和力指纹图谱(第2部分):效能预测的建模性能

QSAR-derived affinity fingerprints (part 2): modeling performance for potency prediction.

作者信息

Cortés-Ciriano Isidro, Škuta Ctibor, Bender Andreas, Svozil Daniel

机构信息

Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK.

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, CB10 1SD, UK.

出版信息

J Cheminform. 2020 Jun 5;12(1):41. doi: 10.1186/s13321-020-00444-5.

Abstract

Affinity fingerprints report the activity of small molecules across a set of assays, and thus permit to gather information about the bioactivities of structurally dissimilar compounds, where models based on chemical structure alone are often limited, and model complex biological endpoints, such as human toxicity and in vitro cancer cell line sensitivity. Here, we propose to model in vitro compound activity using computationally predicted bioactivity profiles as compound descriptors. To this aim, we apply and validate a framework for the calculation of QSAR-derived affinity fingerprints (QAFFP) using a set of 1360 QSAR models generated using K, K, IC and EC data from ChEMBL database. QAFFP thus represent a method to encode and relate compounds on the basis of their similarity in bioactivity space. To benchmark the predictive power of QAFFP we assembled IC data from ChEMBL database for 18 diverse cancer cell lines widely used in preclinical drug discovery, and 25 diverse protein target data sets. This study complements part 1 where the performance of QAFFP in similarity searching, scaffold hopping, and bioactivity classification is evaluated. Despite being inherently noisy, we show that using QAFFP as descriptors leads to errors in prediction on the test set in the ~ 0.65-0.95 pIC units range, which are comparable to the estimated uncertainty of bioactivity data in ChEMBL (0.76-1.00 pIC units). We find that the predictive power of QAFFP is slightly worse than that of Morgan2 fingerprints and 1D and 2D physicochemical descriptors, with an effect size in the 0.02-0.08 pIC units range. Including QSAR models with low predictive power in the generation of QAFFP does not lead to improved predictive power. Given that the QSAR models we used to compute the QAFFP were selected on the basis of data availability alone, we anticipate better modeling results for QAFFP generated using more diverse and biologically meaningful targets. Data sets and Python code are publicly available at https://github.com/isidroc/QAFFP_regression .

摘要

亲和力指纹图谱报告了小分子在一系列检测中的活性,从而能够收集有关结构不同化合物生物活性的信息,而仅基于化学结构的模型往往存在局限性,且能对复杂的生物学终点进行建模,如人类毒性和体外癌细胞系敏感性。在此,我们建议使用计算预测的生物活性谱作为化合物描述符来对体外化合物活性进行建模。为此,我们应用并验证了一个用于计算QSAR衍生亲和力指纹图谱(QAFFP)的框架,该框架使用了从ChEMBL数据库的K、K、IC和EC数据生成的1360个QSAR模型。因此,QAFFP代表了一种基于化合物在生物活性空间中的相似性进行编码和关联的方法。为了评估QAFFP的预测能力,我们从ChEMBL数据库收集了用于临床前药物发现的18种不同癌细胞系的IC数据,以及25种不同蛋白质靶点数据集。本研究补充了第1部分,在第1部分中评估了QAFFP在相似性搜索、骨架跳跃和生物活性分类方面的性能。尽管存在固有噪声,但我们表明,使用QAFFP作为描述符会导致测试集预测误差在约0.65 - 0.95 pIC单位范围内,这与ChEMBL中生物活性数据的估计不确定性(0.76 - 1.00 pIC单位)相当。我们发现,QAFFP的预测能力略逊于Morgan2指纹图谱以及一维和二维物理化学描述符,效应大小在0.02 - 0.08 pIC单位范围内。在生成QAFFP时纳入预测能力较低的QSAR模型并不会提高预测能力。鉴于我们用于计算QAFFP的QSAR模型仅基于数据可用性进行选择,我们预计使用更多样化且具有生物学意义的靶点生成的QAFFP会有更好的建模结果。数据集和Python代码可在https://github.com/isidroc/QAFFP_regression上公开获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a70d/7339533/d83a505b01e9/13321_2020_444_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验