Suppr超能文献

MAHOMES II:一个用于预测金属结合位点是否具有酶活性的网络服务器。

MAHOMES II: A webserver for predicting if a metal binding site is enzymatic.

作者信息

Feehan Ryan, Copeland Matthew, Franklin Meghan W, Slusky Joanna S G

机构信息

Center for Computational Biology, The University of Kansas, 2030 Becker Dr., Lawrence, KS 66047.

Department of Molecular Biosciences, The University of Kansas, 1200 Sunnyside Ave. Lawrence KS 66045-3101.

出版信息

bioRxiv. 2023 Mar 12:2023.03.08.531790. doi: 10.1101/2023.03.08.531790.

Abstract

UNLABELLED

Recent advances have enabled high-quality computationally generated structures for proteins with no solved crystal structures. However, protein function data remains largely limited to experimental methods and homology mapping. Since structure determines function, it is natural that methods capable of using computationally generated structures for functional annotations need to be advanced. Our laboratory recently developed a method to distinguish between metalloenzyme and non-enzyme sites. Here we report improvements to this method by upgrading our physicochemical features to alleviate the need for structures with sub-angstrom precision and using machine learning to reduce training data labeling error. Our improved classifier identifies protein bound metal sites as enzymatic or non-enzymatic with 94% precision and 92% recall. We demonstrate that both adjustments increased predictive performance and reliability on sites with sub-angstrom variations. We constructed a set of predicted metalloprotein structures with no solved crystal structures and no detectable homology to our training data. Our model had an accuracy of 90 - 97.5% depending on the quality of the predicted structures included in our test. Finally, we found the physicochemical trends that drove this model's successful performance were local protein density, second shell ionizable residue burial, and the pocket's accessibility to the site. We anticipate that our model's ability to correctly identify catalytic metal sites could enable identification of new enzymatic mechanisms and improve metalloenzyme design success rates.

SIGNIFICANCE STATEMENT

Identification of enzyme active sites on proteins with unsolved crystallographic structures can accelerate discovery of novel biochemical reactions, which can impact healthcare, industrial processes, and environmental remediation. Our lab has developed an ML tool for predicting sites on computationally generated protein structures as enzymatic and non-enzymatic. We have made our tool available on a webserver, allowing the scientific community to rapidly search previously unknown protein function space.

摘要

未标注

近期的进展使得能够为没有解析出晶体结构的蛋白质生成高质量的计算结构。然而,蛋白质功能数据在很大程度上仍局限于实验方法和同源性映射。由于结构决定功能,因此需要推进能够使用计算生成的结构进行功能注释的方法。我们实验室最近开发了一种区分金属酶和非酶位点的方法。在此,我们报告对该方法的改进,通过升级我们的物理化学特征以减少对亚埃精度结构的需求,并使用机器学习来减少训练数据标记错误。我们改进后的分类器以94%的精度和92%的召回率识别蛋白质结合的金属位点是酶促的还是非酶促的。我们证明这两种调整都提高了对具有亚埃变化的位点的预测性能和可靠性。我们构建了一组没有解析出晶体结构且与我们的训练数据没有可检测同源性的预测金属蛋白结构。根据我们测试中包含的预测结构的质量,我们的模型准确率为90 - 97.5%。最后,我们发现驱动该模型成功的物理化学趋势是局部蛋白质密度、第二壳层可电离残基埋藏以及口袋对该位点的可及性。我们预计我们的模型正确识别催化金属位点的能力能够促成新酶促机制的识别,并提高金属酶设计的成功率。

意义声明

在未解析晶体结构的蛋白质上识别酶活性位点可以加速新型生化反应的发现,这可能会影响医疗保健、工业过程和环境修复。我们实验室开发了一种机器学习工具,用于预测计算生成的蛋白质结构上的位点是酶促的还是非酶促的。我们已将我们的工具发布在网络服务器上,使科学界能够快速搜索以前未知的蛋白质功能空间。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/defc/10028950/df8ecc389faf/nihpp-2023.03.08.531790v1-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验