Department of Computational Medicine and Bioinformatics and Department of Biological Chemistry, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, MI 48109-2218, USA.
Bioinformatics. 2013 Oct 15;29(20):2588-95. doi: 10.1093/bioinformatics/btt447. Epub 2013 Aug 23.
Identification of protein-ligand binding sites is critical to protein function annotation and drug discovery. However, there is no method that could generate optimal binding site prediction for different protein types. Combination of complementary predictions is probably the most reliable solution to the problem.
We develop two new methods, one based on binding-specific substructure comparison (TM-SITE) and another on sequence profile alignment (S-SITE), for complementary binding site predictions. The methods are tested on a set of 500 non-redundant proteins harboring 814 natural, drug-like and metal ion molecules. Starting from low-resolution protein structure predictions, the methods successfully recognize >51% of binding residues with average Matthews correlation coefficient (MCC) significantly higher (with P-value <10(-9) in student t-test) than other state-of-the-art methods, including COFACTOR, FINDSITE and ConCavity. When combining TM-SITE and S-SITE with other structure-based programs, a consensus approach (COACH) can increase MCC by 15% over the best individual predictions. COACH was examined in the recent community-wide COMEO experiment and consistently ranked as the best method in last 22 individual datasets with the Area Under the Curve score 22.5% higher than the second best method. These data demonstrate a new robust approach to protein-ligand binding site recognition, which is ready for genome-wide structure-based function annotations.
鉴定蛋白质-配体结合位点对于蛋白质功能注释和药物发现至关重要。然而,目前尚无方法可以针对不同的蛋白质类型生成最佳的结合位点预测。组合互补预测可能是解决该问题最可靠的方法。
我们开发了两种新方法,一种基于结合特异性亚结构比较(TM-SITE),另一种基于序列轮廓比对(S-SITE),用于互补结合位点预测。该方法在一组包含 814 种天然、类药和金属离子分子的 500 个非冗余蛋白质上进行了测试。从低分辨率的蛋白质结构预测开始,该方法成功地识别出>51%的结合残基,平均马修斯相关系数(MCC)显著更高(学生 t 检验的 P 值<10(-9)),优于其他最先进的方法,包括 COFACTOR、FINDSITE 和 ConCavity。当将 TM-SITE 和 S-SITE 与其他基于结构的程序结合使用时,共识方法(COACH)可以将 MCC 相对于最佳单个预测提高 15%。COACH 在最近的全社区 COMEO 实验中进行了检验,并在最后 22 个单独数据集的排名中始终位居首位,曲线下面积得分比第二名高 22.5%。这些数据证明了一种新的用于蛋白质-配体结合位点识别的稳健方法,它已经准备好进行基于基因组的结构功能注释。