使用归纳逻辑编程进行药效团映射和定量构效关系分析的新方法。应用于嗜热菌蛋白酶抑制剂和糖原磷酸化酶B抑制剂。

New approach to pharmacophore mapping and QSAR analysis using inductive logic programming. Application to thermolysin inhibitors and glycogen phosphorylase B inhibitors.

作者信息

Marchand-Geneste Nathalie, Watson Kimberly A, Alsberg Bjørn K, King Ross D

机构信息

Department of Computer Science, Computational Biology Group, The University of Wales Aberystwyth, Penglais Campus, Aberystwyth, Ceredigion SY23 3DB, Wales, England.

出版信息

J Med Chem. 2002 Jan 17;45(2):399-409. doi: 10.1021/jm0155244.

DOI:10.1021/jm0155244

PMID:11784144

Abstract

A key problem in QSAR is the selection of appropriate descriptors to form accurate regression equations for the compounds under study. Inductive logic programming (ILP) algorithms are a class of machine-learning algorithms that have been successfully applied to a number of SAR problems. Unlike other QSAR methods, which use attributes to describe chemical structure, ILP uses relations. This gives ILP the advantages of not requiring explicit superimposition of individual compounds in a dataset, of dealing naturally with multiple conformations, and of using a language much closer to that used normally by chemists. We unify ILP and standard regression techniques to give a QSAR method that has the strength of ILP at describing steric structure with the familiarity and power of regression methods. Complex pharmacophores, correlating with activity, were identified and used as new indicator variables, along with the comparative molecular field analysis (CoMFA) prediction, to form predictive regression equations. We compared the formation of 3D-QSARs using standard CoMFA with the use of ILP on the well-studied thermolysin zinc protease inhibitor dataset and a glycogen phosphorylase inhibitor dataset. In each case the addition of ILP variables produced statistically better results (P < 0.01 for thermolysin and P < 0.05 for GP datasets) than the CoMFA analysis. Moreover, the new ILP variables were not found to increase the complexity of the final QSAR equations and gave possible insight into the binding mechanism of the ligand-protein complex under study.

摘要

定量构效关系（QSAR）中的一个关键问题是选择合适的描述符，以便为所研究的化合物构建准确的回归方程。归纳逻辑编程（ILP）算法是一类机器学习算法，已成功应用于许多构效关系（SAR）问题。与其他使用属性描述化学结构的QSAR方法不同，ILP使用关系。这赋予了ILP一些优势，即不需要在数据集中对单个化合物进行明确的叠加，能够自然地处理多种构象，并且使用的语言更接近化学家通常使用的语言。我们将ILP与标准回归技术相结合，得到一种QSAR方法，该方法兼具ILP在描述空间结构方面的优势以及回归方法的熟悉性和强大功能。识别出与活性相关的复杂药效团，并将其与比较分子场分析（CoMFA）预测结果一起用作新的指示变量，以构建预测性回归方程。我们在经过充分研究的嗜热菌蛋白酶锌蛋白酶抑制剂数据集和糖原磷酸化酶抑制剂数据集上，比较了使用标准CoMFA构建三维定量构效关系（3D-QSAR）与使用ILP构建3D-QSAR的情况。在每种情况下，添加ILP变量所产生的统计结果（嗜热菌蛋白酶数据集的P < 0.01，糖原磷酸化酶抑制剂数据集的P < 0.05）均优于CoMFA分析。此外，未发现新的ILP变量会增加最终QSAR方程的复杂性，并且可能为所研究的配体-蛋白质复合物的结合机制提供了深入见解。