Division of Biosciences, Institute of Structural and Molecular Biology, University College London, Darwin Building, Gower Street, London WC1E 6BT, UK.
Molecules. 2021 Feb 12;26(4):966. doi: 10.3390/molecules26040966.
Zinc binding proteins make up a significant proportion of the proteomes of most organisms and, within those proteins, zinc performs rôles in catalysis and structure stabilisation. Identifying the ability to bind zinc in a novel protein can offer insights into its functions and the mechanism by which it carries out those functions. Computational means of doing so are faster than spectroscopic means, allowing for searching at much greater speeds and scales, and thereby guiding complimentary experimental approaches. Typically, computational models of zinc binding predict zinc binding for individual residues rather than as a single binding site, and typically do not distinguish between different classes of binding site-missing crucial properties indicative of zinc binding. Previously, we created ZincBindDB, a continuously updated database of known zinc binding sites, categorised by family (the set of liganding residues). Here, we use this dataset to create ZincBindPredict, a set of machine learning methods to predict the most common zinc binding site families for both structure and sequence. The models all achieve an MCC ≥ 0.88, recall ≥ 0.93 and precision ≥ 0.91 for the structural models (mean MCC = 0.97), while the sequence models have MCC ≥ 0.64, recall ≥ 0.80 and precision ≥ 0.83 (mean MCC = 0.87), with the models for binding sites containing four liganding residues performing much better than this. The predictors outperform competing zinc binding site predictors and are available online via a web interface and a GraphQL API.
锌结合蛋白构成了大多数生物体蛋白质组的重要组成部分,在这些蛋白质中,锌在催化和结构稳定化中发挥作用。鉴定新型蛋白质结合锌的能力可以深入了解其功能及其执行功能的机制。与光谱方法相比,计算方法更快,可以以更高的速度和规模进行搜索,从而指导互补的实验方法。通常,锌结合的计算模型预测单个残基而不是单个结合位点的锌结合,并且通常不区分不同类别的结合位点-缺少指示锌结合的关键性质。以前,我们创建了 ZincBindDB,这是一个不断更新的已知锌结合位点数据库,按家族(配体残基集)分类。在这里,我们使用此数据集创建了 ZincBindPredict,这是一组用于预测结构和序列中最常见锌结合位点家族的机器学习方法。这些模型的结构模型的 MCC≥0.88、召回率≥0.93 和精度≥0.91(平均 MCC=0.97),而序列模型的 MCC≥0.64、召回率≥0.80 和精度≥0.83(平均 MCC=0.87),含有四个配体残基的结合位点模型的性能要好得多。预测器优于竞争的锌结合位点预测器,并通过 Web 界面和 GraphQL API 在线提供。