Suppr超能文献

家族内部传承:利用蛋白质家族模板挽救低可信度的AlphaFold2模型。

Keeping it in the family: using protein family templates to rescue low confidence AlphaFold2 models.

作者信息

Costa Francesco, Blum Matthias, Bateman Alex

机构信息

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, United Kingdom.

出版信息

Bioinform Adv. 2024 Nov 25;4(1):vbae188. doi: 10.1093/bioadv/vbae188. eCollection 2024.

Abstract

MOTIVATION

High confidence structure prediction models have become available for nearly all protein sequences. More than 200 million AlphaFold2 models are now publicly available. We observe that there can be significant variability in the prediction confidence as judged by plDDT scores across a protein family. We have explored whether the predictions with lower plDDT in a family can be improved by the use of higher plDDT templates from the family as template structures in AlphaFold2.

RESULTS

Our work shows that about one-third of the time structures with a low plDDT can be "rescued," moved from low to reasonable confidence. We also find that surprisingly in many cases we get a higher plDDT model when we switch off the multiple sequence alignment (MSA) option in AlphaFold2 and solely rely on a high-quality template. However, we find the best overall strategy is to make predictions both with and without the MSA information and select the model with the highest average plDDT. We also find that using high plDDT models as templates can increase the speed of AlphaFold2 as implemented in ColabFold. Additionally, we try to demonstrate that as well as having increased overall plDDT, the models are likely to have higher quality structures as judged by two metrics.

AVAILABILITY AND IMPLEMENTATION

We have implemented our pipeline in NextFlow and it is available in GitHub: https://github.com/FranceCosta/AF2Fix.

摘要

动机

几乎所有蛋白质序列都已有高置信度的结构预测模型。目前已有超过2亿个AlphaFold2模型公开可用。我们观察到,根据蛋白质家族中plDDT分数判断,预测置信度可能存在显著差异。我们探讨了在AlphaFold2中使用家族中plDDT较高的模板结构,是否能改进家族中plDDT较低的预测。

结果

我们的工作表明,约三分之一的情况下,低plDDT的结构可以被“挽救”,从低置信度提升到合理置信度。我们还惊讶地发现,在许多情况下,当我们关闭AlphaFold2中的多序列比对(MSA)选项并仅依赖高质量模板时,会得到更高plDDT的模型。然而,我们发现最佳的总体策略是同时使用有和没有MSA信息进行预测,并选择平均plDDT最高的模型。我们还发现,使用高plDDT模型作为模板可以提高ColabFold中实现的AlphaFold2的速度。此外,我们试图证明,除了总体plDDT增加外,根据两个指标判断,这些模型可能具有更高质量的结构。

可用性和实现方式

我们已在NextFlow中实现了我们的流程,可在GitHub上获取:https://github.com/FranceCosta/AF2Fix

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d84f/11630841/3a315ad66d36/vbae188f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验