Nayal Murad, Honig Barry
Howard Hughes Medical Institute, Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, USA.
Proteins. 2006 Jun 1;63(4):892-906. doi: 10.1002/prot.20897.
In this article we introduce a new method for the identification and the accurate characterization of protein surface cavities. The method is encoded in the program SCREEN (Surface Cavity REcognition and EvaluatioN). As a first test of the utility of our approach we used SCREEN to locate and analyze the surface cavities of a nonredundant set of 99 proteins cocrystallized with drugs. We find that this set of proteins has on average about 14 distinct cavities per protein. In all cases, a drug is bound at one (and sometimes more than one) of these cavities. Using cavity size alone as a criterion for predicting drug-binding sites yields a high balanced error rate of 15.7%, with only 71.7% coverage. Here we characterize each surface cavity by computing a comprehensive set of 408 physicochemical, structural, and geometric attributes. By applying modern machine learning techniques (Random Forests) we were able to develop a classifier that can identify drug-binding cavities with a balanced error rate of 7.2% and coverage of 88.9%. Only 18 of the 408 cavity attributes had a statistically significant role in the prediction. Of these 18 important attributes, almost all involved size and shape rather than physicochemical properties of the surface cavity. The implications of these results are discussed. A SCREEN Web server is available at http://interface.bioc.columbia.edu/screen.
在本文中,我们介绍了一种用于识别和准确表征蛋白质表面空腔的新方法。该方法编码于程序SCREEN(表面空腔识别与评估)中。作为对我们方法实用性的首次测试,我们使用SCREEN来定位和分析与药物共结晶的99种非冗余蛋白质的表面空腔。我们发现,这组蛋白质平均每种蛋白质约有14个不同的空腔。在所有情况下,一种药物结合在这些空腔中的一个(有时不止一个)。仅将空腔大小作为预测药物结合位点的标准,会产生15.7%的高平衡错误率,覆盖率仅为71.7%。在这里,我们通过计算一组全面的408个物理化学、结构和几何属性来表征每个表面空腔。通过应用现代机器学习技术(随机森林),我们能够开发出一种分类器,该分类器能够以7.2%的平衡错误率和88.9%的覆盖率识别药物结合空腔。408个空腔属性中只有18个在预测中具有统计学上的显著作用。在这18个重要属性中,几乎所有属性都涉及表面空腔的大小和形状,而非物理化学性质。我们讨论了这些结果的意义。可通过http://interface.bioc.columbia.edu/screen访问SCREEN网络服务器。