Abbass Jad, Nebel Jean-Christophe
Faculty of Science, Engineering and Computing, Kingston University, London, KT1 2EE, UK.
BMC Bioinformatics. 2015 Apr 29;16(1):136. doi: 10.1186/s12859-015-0576-2.
Since experimental techniques are time and cost consuming, in silico protein structure prediction is essential to produce conformations of protein targets. When homologous structures are not available, fragment-based protein structure prediction has become the approach of choice. However, it still has many issues including poor performance when targets' lengths are above 100 residues, excessive running times and sub-optimal energy functions. Taking advantage of the reliable performance of structural class prediction software, we propose to address some of the limitations of fragment-based methods by integrating structural constraints in their fragment selection process.
Using Rosetta, a state-of-the-art fragment-based protein structure prediction package, we evaluated our proposed pipeline on 70 former CASP targets containing up to 150 amino acids. Using either CATH or SCOP-based structural class annotations, enhancement of structure prediction performance is highly significant in terms of both GDT_TS (at least +2.6, p-values < 0.0005) and RMSD (-0.4, p-values < 0.005). Although CATH and SCOP classifications are different, they perform similarly. Moreover, proteins from all structural classes benefit from the proposed methodology. Further analysis also shows that methods relying on class-based fragments produce conformations which are more relevant to user and converge quicker towards the best model as estimated by GDT_TS (up to 10% in average). This substantiates our hypothesis that usage of structurally relevant templates conducts to not only reducing the size of the conformation space to be explored, but also focusing on a more relevant area.
Since our methodology produces models the quality of which is up to 7% higher in average than those generated by a standard fragment-based predictor, we believe it should be considered before conducting any fragment-based protein structure prediction. Despite such progress, ab initio prediction remains a challenging task, especially for proteins of average and large sizes. Apart from improving search strategies and energy functions, integration of additional constraints seems a promising route, especially if they can be accurately predicted from sequence alone.
由于实验技术既耗时又费钱,因此蛋白质结构的计算机预测对于生成蛋白质靶标的构象至关重要。当没有同源结构可用时,基于片段的蛋白质结构预测已成为首选方法。然而,它仍然存在许多问题,包括当靶标长度超过100个残基时性能不佳、运行时间过长以及能量函数次优等。利用结构类别预测软件的可靠性能,我们建议通过在基于片段的方法的片段选择过程中整合结构约束来解决其中一些局限性。
使用Rosetta(一个基于片段的最先进的蛋白质结构预测软件包),我们在70个含有多达150个氨基酸的前CASP靶标上评估了我们提出的流程。使用基于CATH或SCOP的结构类别注释,在GDT_TS(至少+2.6,p值<0.0005)和RMSD(-0.4,p值<0.005)方面,结构预测性能的提升都非常显著。尽管CATH和SCOP分类不同,但它们的表现相似。此外,所有结构类别的蛋白质都受益于所提出的方法。进一步的分析还表明,依赖基于类别的片段的方法产生的构象与用户的相关性更高,并且如通过GDT_TS估计的那样,更快地收敛到最佳模型(平均高达10%)。这证实了我们的假设,即使用结构相关模板不仅可以减少要探索的构象空间的大小,还可以专注于更相关的区域。
由于我们的方法生成的模型质量平均比基于片段的标准预测器生成的模型高出7%,我们认为在进行任何基于片段的蛋白质结构预测之前都应考虑使用该方法。尽管取得了这样的进展,但从头预测仍然是一项具有挑战性的任务,尤其是对于中等大小和大尺寸的蛋白质。除了改进搜索策略和能量函数外,整合额外的约束似乎是一条有前途的途径,特别是如果它们可以仅从序列准确预测出来。