Diago Luis A, Morell Persy, Aguilera Longendri, Moreno Ernesto
Department of Bioengineering, Faculty of Electrical Engineering, Havana Institute of Technology, Havana 19390, Cuba.
BMC Bioinformatics. 2007 Aug 25;8:310. doi: 10.1186/1471-2105-8-310.
The number of algorithms available to predict ligand-protein interactions is large and ever-increasing. The number of test cases used to validate these methods is usually small and problem dependent. Recently, several databases have been released for further understanding of protein-ligand interactions, having the Protein Data Bank as backend support. Nevertheless, it appears to be difficult to test docking methods on a large variety of complexes. In this paper we report the development of a new database of protein-ligand complexes tailored for testing of docking algorithms.
Using a new definition of molecular contact, small ligands contained in the 2005 PDB edition were identified and processed. The database was enriched in molecular properties. In particular, an automated typing of ligand atoms was performed. A filtering procedure was applied to select a non-redundant dataset of complexes. Data mining was performed to obtain information on the frequencies of different types of atomic contacts. Docking simulations were run with the program DOCK.
We compiled a large database of small ligand-protein complexes, enriched with different calculated properties, that currently contains more than 6000 non-redundant structures. As an example to demonstrate the value of the new database, we derived a new set of chemical matching rules to be used in the context of the program DOCK, based on contact frequencies between ligand atoms and points representing the protein surface, and proved their enhanced efficiency with respect to the default set of rules included in that program.
The new database constitutes a valuable resource for the development of knowledge-based docking algorithms and for testing docking programs on large sets of protein-ligand complexes. The new chemical matching rules proposed in this work significantly increase the success rate in DOCKing simulations. The database developed in this work is available at http://cimlcsext.cim.sld.cu:8080/screeningbrowser/.
可用于预测配体-蛋白质相互作用的算法数量众多且不断增加。用于验证这些方法的测试案例数量通常较少且取决于具体问题。最近,为了进一步理解蛋白质-配体相互作用,已经发布了几个以蛋白质数据库为后端支持的数据库。然而,在大量不同的复合物上测试对接方法似乎很困难。在本文中,我们报告了一个专门为测试对接算法而开发的蛋白质-配体复合物新数据库。
使用分子接触的新定义,识别并处理了2005年蛋白质数据库版本中包含的小分子配体。该数据库在分子特性方面得到了丰富。特别是,对配体原子进行了自动分类。应用了一个过滤程序来选择一个非冗余的复合物数据集。进行了数据挖掘以获取不同类型原子接触频率的信息。使用DOCK程序进行了对接模拟。
我们编制了一个包含不同计算特性的小分子配体-蛋白质复合物大型数据库,目前包含超过6000个非冗余结构。作为展示新数据库价值的一个例子,我们基于配体原子与代表蛋白质表面的点之间的接触频率,推导了一组新的化学匹配规则,用于DOCK程序,并证明了它们相对于该程序中包含的默认规则集具有更高的效率。
新数据库是开发基于知识的对接算法以及在大量蛋白质-配体复合物上测试对接程序的宝贵资源。本文提出的新化学匹配规则显著提高了对接模拟的成功率。这项工作中开发的数据库可在http://cimlcsext.cim.sld.cu:8080/screeningbrowser/获取。