Strömbergsson Helena, Kryshtafovych Andriy, Prusis Peteris, Fidelis Krzysztof, Wikberg Jarl E S, Komorowski Jan, Hvidsten Torgeir R
The Linnaeus Centre for Bioinformatics, Uppsala University, SE-751 24, Uppsala, Sweden.
Proteins. 2006 Nov 15;65(3):568-79. doi: 10.1002/prot.21163.
Modeling and understanding protein-ligand interactions is one of the most important goals in computational drug discovery. To this end, proteochemometrics uses structural and chemical descriptors from several proteins and several ligands to induce interaction-models. Here, we present a new and generalized approach in which proteins varying greatly in terms of sequence and structure are represented by a library of local substructures. Using linear regression and rule-based learning, we combine such local substructures with chemical descriptors from the ligands to model binding affinity for a training set of hydrolase and lyase enzymes. We evaluate the predictive performance of these models using cross validation and sets of unseen ligand with unknown three-dimensional structure. The models are shown to generalize by outperforming models using descriptors from only proteins or only ligands, or models using global structure similarities rather than local similarities. Thus, we demonstrate that this approach is capable of describing dependencies between local structural properties and ligands in otherwise dissimilar protein structures. These dependencies are often, but not always, associated with local substructures that are in contact with the ligands. Finally, we show that strongly bound enzyme-ligand complexes require the presence of particular local substructures, while weakly bound complexes may be described by the absence of certain properties. The results demonstrate that the alignment-independent approach using local substructures is capable of describing protein-ligand interaction for largely different proteins and hence opens up for proteochemometrics-analysis of the interaction-space of entire proteomes. Current approaches are limited to families of closely related proteins. families of closely related proteins.
对蛋白质-配体相互作用进行建模和理解是计算药物发现中最重要的目标之一。为此,蛋白质化学计量学使用来自多种蛋白质和多种配体的结构和化学描述符来构建相互作用模型。在此,我们提出一种新的通用方法,其中序列和结构差异很大的蛋白质由局部子结构库表示。使用线性回归和基于规则的学习,我们将这些局部子结构与来自配体的化学描述符相结合,以对水解酶和裂合酶训练集的结合亲和力进行建模。我们使用交叉验证和一组三维结构未知的未见配体来评估这些模型的预测性能。结果表明,这些模型通过优于仅使用蛋白质描述符或仅使用配体描述符的模型,或使用全局结构相似性而非局部相似性的模型来实现泛化。因此,我们证明这种方法能够描述不同蛋白质结构中局部结构特性与配体之间的依赖性。这些依赖性通常但并非总是与与配体接触的局部子结构相关。最后,我们表明强结合的酶-配体复合物需要特定局部子结构的存在,而弱结合的复合物可能通过某些特性的缺失来描述。结果表明,使用局部子结构的与比对无关的方法能够描述差异很大的蛋白质之间的蛋白质-配体相互作用,因此为整个蛋白质组相互作用空间的蛋白质化学计量学分析开辟了道路。目前的方法仅限于密切相关蛋白质的家族。密切相关蛋白质的家族。