Department of Biochemistry, Indian Institute of Science, Bangalore 560012, Karnataka, India and IISc Mathematics Initiative, Indian Institute of Science, Banglaore 560012, Karnataka, India.
Database (Oxford). 2014 Apr 23;2014(0):bau029. doi: 10.1093/database/bau029. Print 2014.
Most of the biological processes are governed through specific protein-ligand interactions. Discerning different components that contribute toward a favorable protein- ligand interaction could contribute significantly toward better understanding protein function, rationalizing drug design and obtaining design principles for protein engineering. The Protein Data Bank (PDB) currently hosts the structure of ∼68 000 protein-ligand complexes. Although several databases exist that classify proteins according to sequence and structure, a mere handful of them annotate and classify protein-ligand interactions and provide information on different attributes of molecular recognition. In this study, an exhaustive comparison of all the biologically relevant ligand-binding sites (84 846 sites) has been conducted using PocketMatch: a rapid, parallel, in-house algorithm. PocketMatch quantifies the similarity between binding sites based on structural descriptors and residue attributes. A similarity network was constructed using binding sites whose PocketMatch scores exceeded a high similarity threshold (0.80). The binding site similarity network was clustered into discrete sets of similar sites using the Markov clustering (MCL) algorithm. Furthermore, various computational tools have been used to study different attributes of interactions within the individual clusters. The attributes can be roughly divided into (i) binding site characteristics including pocket shape, nature of residues and interaction profiles with different kinds of atomic probes, (ii) atomic contacts consisting of various types of polar, hydrophobic and aromatic contacts along with binding site water molecules that could play crucial roles in protein-ligand interactions and (iii) binding energetics involved in interactions derived from scoring functions developed for docking. For each ligand-binding site in each protein in the PDB, site similarity information, clusters they belong to and description of site attributes are provided as a relational database-protein-ligand interaction clusters (PLIC). Database URL: http://proline.biochem.iisc.ernet.in/PLIC.
大多数生物过程都是通过特定的蛋白质-配体相互作用来控制的。辨别出有助于有利的蛋白质-配体相互作用的不同成分,可以极大地帮助我们更好地理解蛋白质的功能、合理化药物设计并获得蛋白质工程的设计原则。蛋白质数据库(PDB)目前收录了约 68000 个蛋白质-配体复合物的结构。尽管有几个数据库根据序列和结构对蛋白质进行分类,但只有少数几个数据库对蛋白质-配体相互作用进行注释和分类,并提供有关分子识别不同属性的信息。在这项研究中,使用 PocketMatch 对所有具有生物学相关性的配体结合位点(84846 个位点)进行了详尽的比较:这是一种快速、并行的内部算法。PocketMatch 根据结构描述符和残基属性来量化结合位点之间的相似性。使用 PocketMatch 得分超过高相似度阈值(0.80)的结合位点构建了一个相似性网络。使用 Markov 聚类(MCL)算法将结合位点相似性网络聚类为离散的相似位点集。此外,还使用了各种计算工具来研究各个聚类中相互作用的不同属性。这些属性大致可以分为(i)结合位点特征,包括口袋形状、残基性质和与不同种类原子探针的相互作用模式,(ii)原子接触,包括各种类型的极性、疏水性和芳香性接触以及可能在蛋白质-配体相互作用中起关键作用的结合位点水分子,以及(iii)从为对接开发的评分函数中得出的涉及相互作用的结合能。对于 PDB 中的每个蛋白质中的每个配体结合位点,都提供了位点相似性信息、它们所属的聚类以及位点属性的描述,作为一个关系数据库-蛋白质-配体相互作用聚类(PLIC)。数据库网址:http://proline.biochem.iisc.ernet.in/PLIC。