Center for the Study of Systems Biology, Georgia Institute of Technology, Atlanta, Georgia 30318, USA.
Proteins. 2011 Mar;79(3):735-51. doi: 10.1002/prot.22913. Epub 2010 Dec 6.
The rapid accumulation of gene sequences, many of which are hypothetical proteins with unknown function, has stimulated the development of accurate computational tools for protein function prediction with evolution/structure-based approaches showing considerable promise. In this article, we present FINDSITE-metal, a new threading-based method designed specifically to detect metal-binding sites in modeled protein structures. Comprehensive benchmarks using different quality protein structures show that weakly homologous protein models provide sufficient structural information for quite accurate annotation by FINDSITE-metal. Combining structure/evolutionary information with machine learning results in highly accurate metal-binding annotations; for protein models constructed by TASSER, whose average Cα RMSD from the native structure is 8.9 Å, 59.5% (71.9%) of the best of top five predicted metal locations are within 4 Å (8 Å) from a bound metal in the crystal structure. For most of the targets, multiple metal-binding sites are detected with the best predicted binding site at rank 1 and within the top two ranks in 65.6% and 83.1% of the cases, respectively. Furthermore, for iron, copper, zinc, calcium, and magnesium ions, the binding metal can be predicted with high, typically 70% to 90%, accuracy. FINDSITE-metal also provides a set of confidence indexes that help assess the reliability of predictions. Finally, we describe the proteome-wide application of FINDSITE-metal that quantifies the metal-binding complement of the human proteome. FINDSITE-metal is freely available to the academic community at http://cssb.biology.gatech.edu/findsite-metal/.
基因序列的快速积累,其中许多是具有未知功能的假设蛋白质,刺激了基于进化/结构的准确计算工具的发展,这些方法显示出相当大的前景。在本文中,我们提出了 FINDSITE-metal,这是一种新的基于线程的方法,专门用于检测建模蛋白质结构中的金属结合位点。使用不同质量蛋白质结构的综合基准测试表明,弱同源蛋白质模型为 FINDSITE-metal 提供了相当准确的注释所需的足够结构信息。将结构/进化信息与机器学习结果相结合,可得到高度准确的金属结合注释;对于由 TASSER 构建的蛋白质模型,其与天然结构的 Cα RMSD 的平均为 8.9 Å,在晶体结构中,最佳的前五个预测金属位置中有 59.5%(71.9%)位于结合金属的 4 Å(8 Å)内。对于大多数目标,使用最佳预测结合位点在第 1 位和前 2 位内检测到多个金属结合位点,分别占 65.6%和 83.1%的情况。此外,对于铁、铜、锌、钙和镁离子,结合金属可以预测出很高的准确度,通常为 70%至 90%。FINDSITE-metal 还提供了一组置信指数,有助于评估预测的可靠性。最后,我们描述了 FINDSITE-metal 的全蛋白质组应用,该应用量化了人类蛋白质组的金属结合成分。FINDSITE-metal 可在 http://cssb.biology.gatech.edu/findsite-metal/ 上免费提供给学术界。