Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK.
Bioinformatics. 2017 Sep 1;33(17):2684-2690. doi: 10.1093/bioinformatics/btx217.
Protein fold recognition when appropriate, evolutionarily-related, structural templates can be identified is often trivial and may even be viewed as a solved problem. However in cases where no homologous structural templates can be detected, fold recognition is a notoriously difficult problem ( Moult et al., 2014 ). Here we present EigenTHREADER, a novel fold recognition method capable of identifying folds where no homologous structures can be identified. EigenTHREADER takes a query amino acid sequence, generates a map of intra-residue contacts, and then searches a library of contact maps of known structures. To allow the contact maps to be compared, we use eigenvector decomposition to resolve the principal eigenvectors these can then be aligned using standard dynamic programming algorithms. The approach is similar to the Al-Eigen approach of Di Lena et al. (2010) , but with improvements made both to speed and accuracy. With this search strategy, EigenTHREADER does not depend directly on sequence homology between the target protein and entries in the fold library to generate models. This in turn enables EigenTHREADER to correctly identify analogous folds where little or no sequence homology information is.
EigenTHREADER outperforms well-established fold recognition methods such as pGenTHREADER and HHSearch in terms of True Positive Rate in the difficult task of analogous fold recognition. This should allow template-based modelling to be extended to many new protein families that were previously intractable to homology based fold recognition methods.
All code used to generate these results and the computational protocol can be downloaded from https://github.com/DanBuchan/eigen_scripts . EigenTHREADER, the benchmark code and the data this paper is based on can be downloaded from: http://bioinfadmin.cs.ucl.ac.uk/downloads/eigenTHREADER/ .
当可以识别出适当的、进化相关的结构模板时,蛋白质折叠识别通常很简单,甚至可以被视为一个已解决的问题。然而,在无法检测到同源结构模板的情况下,折叠识别是一个众所周知的难题(Moult 等人,2014 年)。在这里,我们提出了 EigenTHREADER,这是一种新的折叠识别方法,能够识别无法识别同源结构的折叠。EigenTHREADER 采用查询氨基酸序列,生成残基内接触图,然后搜索已知结构的接触图库。为了允许比较接触图,我们使用特征向量分解来解析主要特征向量,然后使用标准动态规划算法对齐这些特征向量。这种方法类似于 Di Lena 等人的 Al-Eigen 方法(2010 年),但在速度和准确性方面都有所改进。使用这种搜索策略,EigenTHREADER 不会直接依赖目标蛋白和折叠库中条目的序列同源性来生成模型。这反过来又使 EigenTHREADER 能够正确识别几乎没有或没有序列同源性信息的类似折叠。
在类似折叠识别这一困难任务中,EigenTHREADER 在真阳性率方面优于 pGenTHREADER 和 HHSearch 等成熟的折叠识别方法。这应该允许基于模板的建模扩展到许多以前难以使用基于同源折叠识别方法的新蛋白质家族。
生成这些结果的所有代码和计算协议都可以从 https://github.com/DanBuchan/eigen_scripts 下载。EigenTHREADER、基准代码和本文所依据的数据可以从 http://bioinfadmin.cs.ucl.ac.uk/downloads/eigenTHREADER/ 下载。