Rantanen V V, Denessiouk K A, Gyllenberg M, Koski T, Johnson M S
Department of Mathematics, University of Turku, FIN-20014, Turku, Finland.
J Mol Biol. 2001 Oct 12;313(1):197-214. doi: 10.1006/jmbi.2001.5023.
Here, a protein atom-ligand fragment interaction library is described. The library is based on experimentally solved structures of protein-ligand and protein-protein complexes deposited in the Protein Data Bank (PDB) and it is able to characterize binding sites given a ligand structure suitable for a protein. A set of 30 ligand fragment types were defined to include three or more atoms in order to unambiguously define a frame of reference for interactions of ligand atoms with their receptor proteins. Interactions between ligand fragments and 24 classes of protein target atoms plus a water oxygen atom were collected and segregated according to type. The spatial distributions of individual fragment - target atom pairs were visually inspected in order to obtain rough-grained constraints on the interaction volumes. Data fulfilling these constraints were given as input to an iterative expectation-maximization algorithm that produces as output maximum likelihood estimates of the parameters of the finite Gaussian mixture models. Concepts of statistical pattern recognition and the resulting mixture model densities are used (i) to predict the detailed interactions between Chlorella virus DNA ligase and the adenine ring of its ligand and (ii) to evaluate the "error" in prediction for both the training and validation sets of protein-ligand interaction found in the PDB. These analyses demonstrate that this approach can successfully narrow down the possibilities for both the interacting protein atom type and its location relative to a ligand fragment.
本文描述了一个蛋白质 - 原子 - 配体片段相互作用库。该库基于蛋白质数据银行(PDB)中实验解析的蛋白质 - 配体和蛋白质 - 蛋白质复合物结构,并且在给定适合某一蛋白质的配体结构时,能够对结合位点进行表征。定义了一组30种配体片段类型,每种类型包含三个或更多原子,以便明确界定配体原子与其受体蛋白质相互作用的参考框架。收集了配体片段与24类蛋白质靶原子以及一个水分子氧原子之间的相互作用,并按类型进行了分类。对各个片段 - 靶原子对的空间分布进行了可视化检查,以获得对相互作用体积的粗粒度约束。满足这些约束的数据作为输入提供给迭代期望最大化算法,该算法输出有限高斯混合模型参数的最大似然估计值。利用统计模式识别概念和由此产生的混合模型密度:(i)预测小球藻病毒DNA连接酶与其配体腺嘌呤环之间的详细相互作用;(ii)评估在PDB中发现的蛋白质 - 配体相互作用训练集和验证集预测中的“误差”。这些分析表明,这种方法能够成功缩小相互作用蛋白质原子类型及其相对于配体片段位置的可能性范围。