Costa Francesco, Blum Matthias, Bateman Alex
European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, United Kingdom.
Bioinform Adv. 2024 Nov 25;4(1):vbae188. doi: 10.1093/bioadv/vbae188. eCollection 2024.
High confidence structure prediction models have become available for nearly all protein sequences. More than 200 million AlphaFold2 models are now publicly available. We observe that there can be significant variability in the prediction confidence as judged by plDDT scores across a protein family. We have explored whether the predictions with lower plDDT in a family can be improved by the use of higher plDDT templates from the family as template structures in AlphaFold2.
Our work shows that about one-third of the time structures with a low plDDT can be "rescued," moved from low to reasonable confidence. We also find that surprisingly in many cases we get a higher plDDT model when we switch off the multiple sequence alignment (MSA) option in AlphaFold2 and solely rely on a high-quality template. However, we find the best overall strategy is to make predictions both with and without the MSA information and select the model with the highest average plDDT. We also find that using high plDDT models as templates can increase the speed of AlphaFold2 as implemented in ColabFold. Additionally, we try to demonstrate that as well as having increased overall plDDT, the models are likely to have higher quality structures as judged by two metrics.
We have implemented our pipeline in NextFlow and it is available in GitHub: https://github.com/FranceCosta/AF2Fix.
几乎所有蛋白质序列都已有高置信度的结构预测模型。目前已有超过2亿个AlphaFold2模型公开可用。我们观察到,根据蛋白质家族中plDDT分数判断,预测置信度可能存在显著差异。我们探讨了在AlphaFold2中使用家族中plDDT较高的模板结构,是否能改进家族中plDDT较低的预测。
我们的工作表明,约三分之一的情况下,低plDDT的结构可以被“挽救”,从低置信度提升到合理置信度。我们还惊讶地发现,在许多情况下,当我们关闭AlphaFold2中的多序列比对(MSA)选项并仅依赖高质量模板时,会得到更高plDDT的模型。然而,我们发现最佳的总体策略是同时使用有和没有MSA信息进行预测,并选择平均plDDT最高的模型。我们还发现,使用高plDDT模型作为模板可以提高ColabFold中实现的AlphaFold2的速度。此外,我们试图证明,除了总体plDDT增加外,根据两个指标判断,这些模型可能具有更高质量的结构。
我们已在NextFlow中实现了我们的流程,可在GitHub上获取:https://github.com/FranceCosta/AF2Fix 。