Zhou Hongyi, Zhou Yaoqi
Howard Hughes Medical Institute Center for Single Molecule Biophysics, Department of Physiology & Biophysics, State University of New York at Buffalo, 14214, USA.
Proteins. 2005 Feb 1;58(2):321-8. doi: 10.1002/prot.20308.
Recognizing structural similarity without significant sequence identity has proved to be a challenging task. Sequence-based and structure-based methods as well as their combinations have been developed. Here, we propose a fold-recognition method that incorporates structural information without the need of sequence-to-structure threading. This is accomplished by generating sequence profiles from protein structural fragments. The structure-derived sequence profiles allow a simple integration with evolution-derived sequence profiles and secondary-structural information for an optimized alignment by efficient dynamic programming. The resulting method (called SP(3)) is found to make a statistically significant improvement in both sensitivity of fold recognition and accuracy of alignment over the method based on evolution-derived sequence profiles alone (SP) and the method based on evolution-derived sequence profile and secondary structure profile (SP(2)). SP(3) was tested in SALIGN benchmark for alignment accuracy and Lindahl, PROSPECTOR 3.0, and LiveBench 8.0 benchmarks for remote-homology detection and model accuracy. SP(3) is found to be the most sensitive and accurate single-method server in all benchmarks tested where other methods are available for comparison (although its results are statistically indistinguishable from the next best in some cases and the comparison is subjected to the limitation of time-dependent sequence and/or structural library used by different methods.). In LiveBench 8.0, its accuracy rivals some of the consensus methods such as ShotGun-INBGU, Pmodeller3, Pcons4, and ROBETTA. SP(3) fold-recognition server is available on http://theory.med.buffalo.edu.
事实证明,识别没有显著序列同一性的结构相似性是一项具有挑战性的任务。基于序列和基于结构的方法及其组合已经得到开发。在此,我们提出一种折叠识别方法,该方法无需序列到结构的穿线即可整合结构信息。这是通过从蛋白质结构片段生成序列概况来实现的。源自结构的序列概况允许与源自进化的序列概况和二级结构信息进行简单整合,以便通过高效的动态规划进行优化比对。结果发现,所得方法(称为SP(3))在折叠识别的灵敏度和比对准确性方面比仅基于源自进化的序列概况的方法(SP)以及基于源自进化的序列概况和二级结构概况的方法(SP(2))有统计学上的显著提高。在SALIGN基准测试中对SP(3)进行了比对准确性测试,并在Lindahl、PROSPECTOR 3.0和LiveBench 8.0基准测试中对其进行了远程同源性检测和模型准确性测试。在所有可用于比较的测试基准中(尽管在某些情况下其结果与次优结果在统计学上无显著差异,且比较受到不同方法所使用的随时间变化的序列和/或结构库的限制),发现SP(3)是最灵敏和准确的单一方法服务器。在LiveBench 8.0中,其准确性可与一些共识方法相媲美,如ShotGun-INBGU、Pmodeller3、Pcons4和ROBETTA。SP(3)折叠识别服务器可在http://theory.med.buffalo.edu上获取。