Anishchenko Ivan, Kundrotas Petras J, Tuzikov Alexander V, Vakser Ilya A
Center for Bioinformatics, The University of Kansas, Lawrence, Kansas, 66047.
United Institute of Informatics Problems, National Academy of Sciences, Minsk, 220012, Belarus.
Proteins. 2015 Sep;83(9):1563-70. doi: 10.1002/prot.24736. Epub 2015 Jun 13.
Structural characterization of protein-protein interactions is important for understanding life processes. Because of the inherent limitations of experimental techniques, such characterization requires computational approaches. Along with the traditional protein-protein docking (free search for a match between two proteins), comparative (template-based) modeling of protein-protein complexes has been gaining popularity. Its development puts an emphasis on full and partial structural similarity between the target protein monomers and the protein-protein complexes previously determined by experimental techniques (templates). The template-based docking relies on the quality and diversity of the template set. We present a carefully curated, nonredundant library of templates containing 4950 full structures of binary complexes and 5936 protein-protein interfaces extracted from the full structures at 12 Å distance cut-off. Redundancy in the libraries was removed by clustering the PDB structures based on structural similarity. The value of the clustering threshold was determined from the analysis of the clusters and the docking performance on a benchmark set. High structural quality of the interfaces in the template and validation sets was achieved by automated procedures and manual curation. The library is included in the Dockground resource for molecular recognition studies at http://dockground.bioinformatics.ku.edu.
蛋白质-蛋白质相互作用的结构表征对于理解生命过程至关重要。由于实验技术存在固有限制,此类表征需要计算方法。除了传统的蛋白质-蛋白质对接(自由搜索两种蛋白质之间的匹配),蛋白质-蛋白质复合物的比较(基于模板)建模也越来越受欢迎。其发展强调目标蛋白质单体与先前通过实验技术确定的蛋白质-蛋白质复合物(模板)之间的完全和部分结构相似性。基于模板的对接依赖于模板集的质量和多样性。我们展示了一个精心策划的非冗余模板库,其中包含4950个二元复合物的完整结构以及从12 Å距离截止的完整结构中提取的5936个蛋白质-蛋白质界面。通过基于结构相似性对PDB结构进行聚类,去除了库中的冗余。聚类阈值的值是通过对聚类的分析以及在基准集上的对接性能确定的。通过自动化程序和人工整理,实现了模板集和验证集中界面的高结构质量。该库包含在用于分子识别研究的Dockground资源中,网址为http://dockground.bioinformatics.ku.edu。