一种使用支持向量归纳逻辑编程开发特定系统功能以对蛋白质-配体对接复合物进行评分的通用方法。

A general approach for developing system-specific functions to score protein-ligand docked complexes using support vector inductive logic programming.

作者信息

Amini Ata, Shrimpton Paul J, Muggleton Stephen H, Sternberg Michael J E

机构信息

Structural Bioinformatics Group, Centre for Bioinformatics, Division of Molecular Biosciences, Imperial College London, London SW7 2AY, United Kingdom.

出版信息

Proteins. 2007 Dec 1;69(4):823-31. doi: 10.1002/prot.21782.

DOI:10.1002/prot.21782

PMID:17910057

Abstract

Despite the increased recent use of protein-ligand and protein-protein docking in the drug discovery process due to the increases in computational power, the difficulty of accurately ranking the binding affinities of a series of ligands or a series of proteins docked to a protein receptor remains largely unsolved. This problem is of major concern in lead optimization procedures and has lead to the development of scoring functions tailored to rank the binding affinities of a series of ligands to a specific system. However, such methods can take a long time to develop and their transferability to other systems remains open to question. Here we demonstrate that given a suitable amount of background information a new approach using support vector inductive logic programming (SVILP) can be used to produce system-specific scoring functions. Inductive logic programming (ILP) learns logic-based rules for a given dataset that can be used to describe properties of each member of the set in a qualitative manner. By combining ILP with support vector machine regression, a quantitative set of rules can be obtained. SVILP has previously been used in a biological context to examine datasets containing a series of singular molecular structures and properties. Here we describe the use of SVILP to produce binding affinity predictions of a series of ligands to a particular protein. We also for the first time examine the applicability of SVILP techniques to datasets consisting of protein-ligand complexes. Our results show that SVILP performs comparably with other state-of-the-art methods on five protein-ligand systems as judged by similar cross-validated squares of their correlation coefficients. A McNemar test comparing SVILP to CoMFA and CoMSIA across the five systems indicates our method to be significantly better on one occasion. The ability to graphically display and understand the SVILP-produced rules is demonstrated and this feature of ILP can be used to derive hypothesis for future ligand design in lead optimization procedures. The approach can readily be extended to evaluate the binding affinities of a series of protein-protein complexes.

摘要

尽管由于计算能力的提升，蛋白质-配体和蛋白质-蛋白质对接在药物发现过程中的使用有所增加，但准确排序与蛋白质受体对接的一系列配体或一系列蛋白质的结合亲和力的难度在很大程度上仍未解决。这个问题在先导优化程序中备受关注，并促使人们开发了专门用于排序一系列配体与特定系统结合亲和力的评分函数。然而，此类方法的开发可能需要很长时间，并且它们对其他系统的可转移性仍存在疑问。在这里，我们证明，给定适量的背景信息，一种使用支持向量归纳逻辑编程（SVILP）的新方法可用于生成特定系统的评分函数。归纳逻辑编程（ILP）为给定数据集学习基于逻辑的规则，这些规则可用于定性描述集合中每个成员的属性。通过将ILP与支持向量机回归相结合，可以获得一组定量规则。SVILP此前已在生物学背景下用于检查包含一系列单一分子结构和属性的数据集。在这里，我们描述了使用SVILP来预测一系列配体与特定蛋白质的结合亲和力。我们还首次研究了SVILP技术对由蛋白质-配体复合物组成的数据集的适用性。我们的结果表明，根据其相关系数的类似交叉验证平方判断，SVILP在五个蛋白质-配体系统上的表现与其他现有最佳方法相当。在这五个系统上对SVILP与CoMFA和CoMSIA进行的McNemar检验表明，我们的方法在一种情况下明显更好。展示了以图形方式显示和理解SVILP生成的规则的能力，并且ILP的这一特性可用于在先导优化程序中推导未来配体设计的假设。该方法可以很容易地扩展到评估一系列蛋白质-蛋白质复合物的结合亲和力。