使用支持向量归纳逻辑编程发现蛋白质-配体特异性的规则。

Discovering rules for protein-ligand specificity using support vector inductive logic programming.

作者信息

Kelley Lawrence A, Shrimpton Paul J, Muggleton Stephen H, Sternberg Michael J E

机构信息

Structural Bioinformatics Group, Division of Molecular Biosciences, Imperial College London, London, UK.

出版信息

Protein Eng Des Sel. 2009 Sep;22(9):561-7. doi: 10.1093/protein/gzp035. Epub 2009 Jul 2.

DOI:10.1093/protein/gzp035

PMID:19574295

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3913550/

Abstract

Structural genomics initiatives are rapidly generating vast numbers of protein structures. Comparative modelling is also capable of producing accurate structural models for many protein sequences. However, for many of the known structures, functions are not yet determined, and in many modelling tasks, an accurate structural model does not necessarily tell us about function. Thus, there is a pressing need for high-throughput methods for determining function from structure. The spatial arrangement of key amino acids in a folded protein, on the surface or buried in clefts, is often the determinants of its biological function. A central aim of molecular biology is to understand the relationship between such substructures or surfaces and biological function, leading both to function prediction and to function design. We present a new general method for discovering the features of binding pockets that confer specificity for particular ligands. Using a recently developed machine-learning technique which couples the rule-discovery approach of inductive logic programming with the statistical learning power of support vector machines, we are able to discriminate, with high precision (90%) and recall (86%) between pockets that bind FAD and those that bind NAD on a large benchmark set given only the geometry and composition of the backbone of the binding pocket without the use of docking. In addition, we learn rules governing this specificity which can feed into protein functional design protocols. An analysis of the rules found suggests that key features of the binding pocket may be tied to conformational freedom in the ligand. The representation is sufficiently general to be applicable to any discriminatory binding problem. All programs and data sets are freely available to non-commercial users at http://www.sbg.bio.ic.ac.uk/svilp_ligand/.

摘要

结构基因组学计划正在迅速产生大量的蛋白质结构。比较建模也能够为许多蛋白质序列生成精确的结构模型。然而，对于许多已知结构，其功能尚未确定，而且在许多建模任务中，精确的结构模型并不一定能告诉我们其功能。因此，迫切需要从结构确定功能的高通量方法。折叠蛋白质中关键氨基酸在表面或埋于裂缝中的空间排列，通常是其生物学功能的决定因素。分子生物学的一个核心目标是理解这些亚结构或表面与生物学功能之间的关系，从而实现功能预测和功能设计。我们提出了一种新的通用方法，用于发现赋予特定配体特异性的结合口袋特征。使用一种最近开发的机器学习技术，该技术将归纳逻辑编程的规则发现方法与支持向量机的统计学习能力相结合，我们能够在仅给定结合口袋主链的几何形状和组成且不使用对接的情况下，在一个大型基准数据集上以高精度（90%）和召回率（86%）区分结合FAD的口袋和结合NAD的口袋。此外，我们还学习了支配这种特异性的规则，这些规则可用于蛋白质功能设计方案。对所发现规则的分析表明，结合口袋的关键特征可能与配体中的构象自由度相关。该表示法具有足够的通用性，可应用于任何歧视性结合问题。所有程序和数据集均可在http://www.sbg.bio.ic.ac.uk/svilp_ligand/上免费提供给非商业用户。

相似文献

Discovering rules for protein-ligand specificity using support vector inductive logic programming.

Protein Eng Des Sel. 2009 Sep;22(9):561-7. doi: 10.1093/protein/gzp035. Epub 2009 Jul 2.

A general approach for developing system-specific functions to score protein-ligand docked complexes using support vector inductive logic programming.

Proteins. 2007 Dec 1;69(4):823-31. doi: 10.1002/prot.21782.

Conformational diversity of ligands bound to proteins.

J Mol Biol. 2006 Mar 3;356(4):928-44. doi: 10.1016/j.jmb.2005.12.012. Epub 2005 Dec 20.

Adenine recognition: a motif present in ATP-, CoA-, NAD-, NADP-, and FAD-dependent proteins.

Proteins. 2001 Aug 15;44(3):282-91. doi: 10.1002/prot.1093.

Predicting flavin and nicotinamide adenine dinucleotide-binding sites in proteins using the fragment transformation method.

Biomed Res Int. 2015;2015:402536. doi: 10.1155/2015/402536. Epub 2015 Apr 27.

A novel logic-based approach for quantitative toxicology prediction.

J Chem Inf Model. 2007 May-Jun;47(3):998-1006. doi: 10.1021/ci600223d. Epub 2007 Apr 24.

A new protein binding pocket similarity measure based on comparison of clouds of atoms in 3D: application to ligand prediction.

BMC Bioinformatics. 2010 Feb 22;11:99. doi: 10.1186/1471-2105-11-99.

Automated identification of protein-ligand interaction features using Inductive Logic Programming: a hexose binding case study.

BMC Bioinformatics. 2012 Jul 11;13:162. doi: 10.1186/1471-2105-13-162.

A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking.

Bioinformatics. 2010 May 1;26(9):1169-75. doi: 10.1093/bioinformatics/btq112. Epub 2010 Mar 17.

Cofactor-binding sites in proteins of deviating sequence: comparative analysis and clustering in torsion angle, cavity, and fold space.

Proteins. 2012 Feb;80(2):626-48. doi: 10.1002/prot.23226. Epub 2011 Nov 17.

引用本文的文献

Sunsetting Binding MOAD with its last data update and the addition of 3D-ligand polypharmacology tools.

Sci Rep. 2023 Feb 21;13(1):3008. doi: 10.1038/s41598-023-29996-w.

LIMLE, a new molecule over-expressed following activation, is involved in the stimulatory properties of dendritic cells.

PLoS One. 2014 Apr 4;9(4):e93894. doi: 10.1371/journal.pone.0093894. eCollection 2014.

Homology modeling and structural comparison of leucine rich repeats of Toll like receptors 1-10 of ruminants.

J Mol Model. 2013 Sep;19(9):3863-74. doi: 10.1007/s00894-013-1871-3. Epub 2013 Jun 28.

Knowledge discovery in variant databases using inductive logic programming.

Bioinform Biol Insights. 2013 Mar 18;7:119-31. doi: 10.4137/BBI.S11184. Print 2013.

本文引用的文献

The sequence-structure relationship and protein function prediction.

Curr Opin Struct Biol. 2009 Jun;19(3):357-62. doi: 10.1016/j.sbi.2009.03.008. Epub 2009 May 4.

A general approach for developing system-specific functions to score protein-ligand docked complexes using support vector inductive logic programming.

Proteins. 2007 Dec 1;69(4):823-31. doi: 10.1002/prot.21782.

A novel logic-based approach for quantitative toxicology prediction.

J Chem Inf Model. 2007 May-Jun;47(3):998-1006. doi: 10.1021/ci600223d. Epub 2007 Apr 24.

Support vector inductive logic programming outperforms the naive Bayes classifier and inductive logic programming for the classification of bioactive chemical compounds.

J Comput Aided Mol Des. 2007 May;21(5):269-80. doi: 10.1007/s10822-007-9113-3. Epub 2007 Mar 27.

Protein function prediction using local 3D templates.

J Mol Biol. 2005 Aug 19;351(3):614-26. doi: 10.1016/j.jmb.2005.05.067.

Binding MOAD (Mother Of All Databases).

Proteins. 2005 Aug 15;60(3):333-40. doi: 10.1002/prot.20512.

Functional genomic hypothesis generation and experimentation by a robot scientist.

Nature. 2004 Jan 15;427(6971):247-52. doi: 10.1038/nature02236.

The automatic discovery of structural principles describing protein fold space.

J Mol Biol. 2003 Jul 18;330(4):839-50. doi: 10.1016/s0022-2836(03)00620-x.

Annotation in three dimensions. PINTS: Patterns in Non-homologous Tertiary Structures.

Nucleic Acids Res. 2003 Jul 1;31(13):3341-4. doi: 10.1093/nar/gkg506.

Recognition templates for predicting adenylate-binding sites in proteins.

J Mol Biol. 2001 Dec 14;314(5):1245-55. doi: 10.1006/jmbi.2000.5201.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用支持向量归纳逻辑编程发现蛋白质-配体特异性的规则。

Discovering rules for protein-ligand specificity using support vector inductive logic programming.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献