Key Lab of Intelligent Information Processing, Big-Data Academy, Institute of Computing Technology, Chinese Academy of Sciences, 100190, Beijing, China.
University of Chinese Academy of Sciences, 100049, Beijing, China.
BMC Bioinformatics. 2021 Sep 15;22(1):439. doi: 10.1186/s12859-021-04353-8.
Accurate prediction of protein tertiary structures is highly desired as the knowledge of protein structures provides invaluable insights into protein functions. We have designed two approaches to protein structure prediction, including a template-based modeling approach (called ProALIGN) and an ab initio prediction approach (called ProFOLD). Briefly speaking, ProALIGN aligns a target protein with templates through exploiting the patterns of context-specific alignment motifs and then builds the final structure with reference to the homologous templates. In contrast, ProFOLD uses an end-to-end neural network to estimate inter-residue distances of target proteins and builds structures that satisfy these distance constraints. These two approaches emphasize different characteristics of target proteins: ProALIGN exploits structure information of homologous templates of target proteins while ProFOLD exploits the co-evolutionary information carried by homologous protein sequences. Recent progress has shown that the combination of template-based modeling and ab initio approaches is promising.
In the study, we present FALCON2, a web server that integrates ProALIGN and ProFOLD to provide high-quality protein structure prediction service. For a target protein, FALCON2 executes ProALIGN and ProFOLD simultaneously to predict possible structures and selects the most likely one as the final prediction result. We evaluated FALCON2 on widely-used benchmarks, including 104 CASP13 (the 13th Critical Assessment of protein Structure Prediction) targets and 91 CASP14 targets. In-depth examination suggests that when high-quality templates are available, ProALIGN is superior to ProFOLD and in other cases, ProFOLD shows better performance. By integrating these two approaches with different emphasis, FALCON2 server outperforms the two individual approaches and also achieves state-of-the-art performance compared with existing approaches.
By integrating template-based modeling and ab initio approaches, FALCON2 provides an easy-to-use and high-quality protein structure prediction service for the community and we expect it to enable insights into a deep understanding of protein functions.
准确预测蛋白质三级结构是非常需要的,因为蛋白质结构的知识为蛋白质功能提供了宝贵的见解。我们设计了两种蛋白质结构预测方法,包括基于模板的建模方法(称为 ProALIGN)和从头预测方法(称为 ProFOLD)。简而言之,ProALIGN 通过利用上下文特定对齐基序的模式将目标蛋白与模板对齐,然后参照同源模板构建最终结构。相比之下,ProFOLD 使用端到端神经网络来估计目标蛋白的残基间距离,并构建满足这些距离约束的结构。这两种方法强调了目标蛋白的不同特征:ProALIGN 利用目标蛋白同源模板的结构信息,而 ProFOLD 利用同源蛋白序列携带的共进化信息。最近的进展表明,基于模板的建模和从头方法的结合是有前途的。
在这项研究中,我们提出了 FALCON2,一个集成了 ProALIGN 和 ProFOLD 的网络服务器,为高质量的蛋白质结构预测提供服务。对于一个目标蛋白,FALCON2 同时执行 ProALIGN 和 ProFOLD 来预测可能的结构,并选择最有可能的结构作为最终预测结果。我们在广泛使用的基准上评估了 FALCON2,包括 104 个 CASP13(第 13 届蛋白质结构预测关键评估)目标和 91 个 CASP14 目标。深入的检查表明,当高质量的模板可用时,ProALIGN 优于 ProFOLD,而在其他情况下,ProFOLD 表现出更好的性能。通过整合这两种方法,强调不同的重点,FALCON2 服务器优于两种独立的方法,并且与现有的方法相比也实现了最先进的性能。
通过整合基于模板的建模和从头方法,FALCON2 为社区提供了一种易于使用和高质量的蛋白质结构预测服务,我们期望它能够深入了解蛋白质功能。