Suppr超能文献

基于基团贡献法的定量构效关系模型及其适用域

GTM-Based QSAR Models and Their Applicability Domains.

作者信息

Gaspar H A, Baskin I I, Marcou G, Horvath D, Varnek A

机构信息

Laboratoire de Chemoinformatique, UMR 7140, Université de Strasbourg, 1 rue Blaise Pascal, Strasbourg 67000, France.

Department of Physics, Moscow State University, 119991, Moscow, Russian Federation.

出版信息

Mol Inform. 2015 Jun;34(6-7):348-56. doi: 10.1002/minf.201400153. Epub 2015 Feb 3.

Abstract

In this paper we demonstrate that Generative Topographic Mapping (GTM), a machine learning method traditionally used for data visualisation, can be efficiently applied to QSAR modelling using probability distribution functions (PDF) computed in the latent 2-dimensional space. Several different scenarios of the activity assessment were considered: (i) the "activity landscape" approach based on direct use of PDF, (ii) QSAR models involving GTM-generated on descriptors derived from PDF, and, (iii) the k-Nearest Neighbours approach in 2D latent space. Benchmarking calculations were performed on five different datasets: stability constants of metal cations Ca(2+) , Gd(3+) and Lu(3+) complexes with organic ligands in water, aqueous solubility and activity of thrombin inhibitors. It has been shown that the performance of GTM-based regression models is similar to that obtained with some popular machine-learning methods (random forest, k-NN, M5P regression tree and PLS) and ISIDA fragment descriptors. By comparing GTM activity landscapes built both on predicted and experimental activities, we may visually assess the model's performance and identify the areas in the chemical space corresponding to reliable predictions. The applicability domain used in this work is based on data likelihood. Its application has significantly improved the model performances for 4 out of 5 datasets.

摘要

在本文中,我们证明了生成地形映射(GTM)这种传统上用于数据可视化的机器学习方法,可以通过在潜在二维空间中计算的概率分布函数(PDF)有效地应用于定量构效关系(QSAR)建模。我们考虑了几种不同的活性评估方案:(i)基于直接使用PDF的“活性景观”方法;(ii)涉及从PDF导出的描述符上生成GTM的QSAR模型;以及(iii)二维潜在空间中的k近邻方法。我们在五个不同的数据集上进行了基准计算:金属阳离子Ca(2+)、Gd(3+)和Lu(3+)与有机配体在水中的配合物的稳定常数、凝血酶抑制剂的水溶性和活性。结果表明,基于GTM的回归模型的性能与一些流行的机器学习方法(随机森林、k近邻、M5P回归树和偏最小二乘法)以及ISIDA片段描述符所获得的性能相似。通过比较基于预测活性和实验活性构建的GTM活性景观,我们可以直观地评估模型的性能,并识别化学空间中与可靠预测相对应的区域。本文中使用的适用域基于数据似然性。它的应用显著提高了5个数据集中4个数据集的模型性能。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验