Center for Computational Biology, The University of Kansas, 2030 Becker Dr, 66047, Lawrence, Kansas, USA.
Department of Molecular Biosciences|, The University of Kansas, Ave. Lawrence KS 66045-3101, 1200, Sunnyside, Kansas, USA.
Protein Sci. 2023 Apr;32(4):e4626. doi: 10.1002/pro.4626.
Recent advances have enabled high-quality computationally generated structures for proteins with no solved crystal structures. However, protein function data remains largely limited to experimental methods and homology mapping. Since structure determines function, it is natural that methods capable of using computationally generated structures for functional annotations need to be advanced. Our laboratory recently developed a method to distinguish between metalloenzyme and nonenzyme sites. Here we report improvements to this method by upgrading our physicochemical features to alleviate the need for structures with sub-angstrom precision and using machine learning to reduce training data labeling error. Our improved classifier identifies protein bound metal sites as enzymatic or nonenzymatic with 94% precision and 92% recall. We demonstrate that both adjustments increased predictive performance and reliability on sites with sub-angstrom variations. We constructed a set of predicted metalloprotein structures with no solved crystal structures and no detectable homology to our training data. Our model had an accuracy of 90%-97.5% depending on the quality of the predicted structures included in our test. Finally, we found the physicochemical trends that drove this model's successful performance were local protein density, second shell ionizable residue burial, and the pocket's accessibility to the site. We anticipate that our model's ability to correctly identify catalytic metal sites could enable identification of new enzymatic mechanisms and improve de novo metalloenzyme design success rates.
最近的进展使得能够为没有解决晶体结构的蛋白质生成高质量的计算结构。然而,蛋白质功能数据在很大程度上仍然限于实验方法和同源映射。由于结构决定功能,因此需要开发能够将计算生成的结构用于功能注释的方法。我们实验室最近开发了一种区分金属酶和非酶位点的方法。在这里,我们通过升级我们的物理化学特征来减轻对亚原子精度结构的需求,并使用机器学习来减少训练数据标记错误,从而改进了该方法。我们改进的分类器可以以 94%的精度和 92%的召回率识别蛋白质结合金属位点是酶促的还是非酶促的。我们证明这两个调整都提高了对亚原子变化的预测性能和可靠性。我们构建了一组没有解决晶体结构且与我们的训练数据没有可检测同源性的预测金属蛋白结构。我们的模型在测试中包含的预测结构的质量不同,准确性在 90%-97.5%之间。最后,我们发现驱动该模型成功表现的物理化学趋势是局部蛋白质密度、第二壳可离子化残基埋藏和口袋对位点的可及性。我们预计,我们的模型能够正确识别催化金属位点的能力可以识别新的酶促机制并提高从头设计金属酶的成功率。