Bordner Andrew J
Mayo Clinic, 13400 East Shea Boulevard, Scottsdale, AZ 85259, USA.
Bioinformatics. 2008 Dec 15;24(24):2865-71. doi: 10.1093/bioinformatics/btn543. Epub 2008 Oct 21.
Specific non-covalent binding of metal ions and ligands, such as nucleotides and cofactors, is essential for the function of many proteins. Computational methods are useful for predicting the location of such binding sites when experimental information is lacking. Methods that use structural information, when available, are particularly promising since they can potentially identify non-contiguous binding motifs that cannot be found using only the amino acid sequence. Furthermore, a prediction method that can utilize low-resolution models is advantageous because high-resolution structures are available for only a relatively small fraction of proteins.
SitePredict is a machine learning-based method for predicting binding sites in protein structures for specific metal ions or small molecules. The method uses Random Forest classifiers trained on diverse residue-based site properties including spatial clustering of residue types and evolutionary conservation. SitePredict was tested by cross-validation on a set of known binding sites for six different metal ions and five different small molecules in a non-redundant set of protein-ligand complex structures. The prediction performance was good for all ligands considered, as reflected by AUC values of at least 0.8. Furthermore, a more realistic test on unbound structures showed only a slight decrease in the accuracy. The properties that contribute the most to the prediction accuracy of each ligand were also examined. Finally, examples of predicted binding sites in homology models and uncharacterized proteins are discussed.
Binding site prediction results for all PDB protein structures and human protein homology models are available at http://sitepredict.org/.
金属离子与配体(如核苷酸和辅因子)之间特定的非共价结合对于许多蛋白质的功能至关重要。当缺乏实验信息时,计算方法有助于预测此类结合位点的位置。若有可用的结构信息,使用结构信息的方法尤其有前景,因为它们有可能识别仅通过氨基酸序列无法找到的非连续结合基序。此外,能够利用低分辨率模型的预测方法具有优势,因为只有相对一小部分蛋白质具有高分辨率结构。
SitePredict是一种基于机器学习的方法,用于预测蛋白质结构中特定金属离子或小分子的结合位点。该方法使用基于不同残基位点属性训练的随机森林分类器,这些属性包括残基类型的空间聚类和进化保守性。在一组非冗余蛋白质 - 配体复合物结构中,通过对六个不同金属离子和五个不同小分子的一组已知结合位点进行交叉验证来测试SitePredict。对于所有考虑的配体,预测性能良好,至少0.8的AUC值反映了这一点。此外,对未结合结构进行的更实际测试表明准确性仅略有下降。还研究了对每个配体预测准确性贡献最大的属性。最后,讨论了同源模型和未表征蛋白质中预测结合位点的示例。
所有PDB蛋白质结构和人类蛋白质同源模型的结合位点预测结果可在http://sitepredict.org/获取。