Kelley Lawrence A, Shrimpton Paul J, Muggleton Stephen H, Sternberg Michael J E
Structural Bioinformatics Group, Division of Molecular Biosciences, Imperial College London, London, UK.
Protein Eng Des Sel. 2009 Sep;22(9):561-7. doi: 10.1093/protein/gzp035. Epub 2009 Jul 2.
Structural genomics initiatives are rapidly generating vast numbers of protein structures. Comparative modelling is also capable of producing accurate structural models for many protein sequences. However, for many of the known structures, functions are not yet determined, and in many modelling tasks, an accurate structural model does not necessarily tell us about function. Thus, there is a pressing need for high-throughput methods for determining function from structure. The spatial arrangement of key amino acids in a folded protein, on the surface or buried in clefts, is often the determinants of its biological function. A central aim of molecular biology is to understand the relationship between such substructures or surfaces and biological function, leading both to function prediction and to function design. We present a new general method for discovering the features of binding pockets that confer specificity for particular ligands. Using a recently developed machine-learning technique which couples the rule-discovery approach of inductive logic programming with the statistical learning power of support vector machines, we are able to discriminate, with high precision (90%) and recall (86%) between pockets that bind FAD and those that bind NAD on a large benchmark set given only the geometry and composition of the backbone of the binding pocket without the use of docking. In addition, we learn rules governing this specificity which can feed into protein functional design protocols. An analysis of the rules found suggests that key features of the binding pocket may be tied to conformational freedom in the ligand. The representation is sufficiently general to be applicable to any discriminatory binding problem. All programs and data sets are freely available to non-commercial users at http://www.sbg.bio.ic.ac.uk/svilp_ligand/.
结构基因组学计划正在迅速产生大量的蛋白质结构。比较建模也能够为许多蛋白质序列生成精确的结构模型。然而,对于许多已知结构,其功能尚未确定,而且在许多建模任务中,精确的结构模型并不一定能告诉我们其功能。因此,迫切需要从结构确定功能的高通量方法。折叠蛋白质中关键氨基酸在表面或埋于裂缝中的空间排列,通常是其生物学功能的决定因素。分子生物学的一个核心目标是理解这些亚结构或表面与生物学功能之间的关系,从而实现功能预测和功能设计。我们提出了一种新的通用方法,用于发现赋予特定配体特异性的结合口袋特征。使用一种最近开发的机器学习技术,该技术将归纳逻辑编程的规则发现方法与支持向量机的统计学习能力相结合,我们能够在仅给定结合口袋主链的几何形状和组成且不使用对接的情况下,在一个大型基准数据集上以高精度(90%)和召回率(86%)区分结合FAD的口袋和结合NAD的口袋。此外,我们还学习了支配这种特异性的规则,这些规则可用于蛋白质功能设计方案。对所发现规则的分析表明,结合口袋的关键特征可能与配体中的构象自由度相关。该表示法具有足够的通用性,可应用于任何歧视性结合问题。所有程序和数据集均可在http://www.sbg.bio.ic.ac.uk/svilp_ligand/上免费提供给非商业用户。