Department of Chemistry, Tongji University, Shanghai 200092, China.
Bioinformatics. 2012 Jan 1;28(1):32-9. doi: 10.1093/bioinformatics/btr611. Epub 2011 Nov 7.
The precise prediction of protein secondary structure is of key importance for the prediction of 3D structure and biological function. Although the development of many excellent methods over the last few decades has allowed the achievement of prediction accuracies of up to 80%, progress seems to have reached a bottleneck, and further improvements in accuracy have proven difficult.
We propose for the first time a structural position-specific scoring matrix (SPSSM), and establish an unprecedented database of 9 million sequences and their SPSSMs. This database, when combined with a purpose-designed BLAST tool, provides a novel prediction tool: SPSSMPred. When the SPSSMPred was validated on a large dataset (10,814 entries), the Q3 accuracy of the protein secondary structure prediction was 93.4%. Our approach was tested on the two latest EVA sets; accuracies of 82.7 and 82.0% were achieved, far higher than can be achieved using other predictors. For further evaluation, we tested our approach on newly determined sequences (141 entries), and obtained an accuracy of 89.6%. For a set of low-homology proteins (40 entries), the SPSSMPred still achieved a Q3 value of 84.6%.
The SPSSMPred server is available at http://cal.tongji.edu.cn/SPSSMPred/
准确预测蛋白质二级结构对于预测 3D 结构和生物功能至关重要。尽管在过去几十年中开发了许多优秀的方法,已经能够达到高达 80%的预测精度,但似乎已经达到了一个瓶颈,进一步提高精度变得困难。
我们首次提出了一种结构位置特异性评分矩阵(SPSSM),并建立了一个前所未有的包含 900 万序列及其 SPSSM 的数据库。该数据库与专门设计的 BLAST 工具结合,提供了一种新的预测工具:SPSSMPred。当在一个大型数据集(10814 个条目)上验证 SPSSMPred 时,蛋白质二级结构预测的 Q3 精度为 93.4%。我们的方法在最新的两个 EVA 数据集上进行了测试,分别实现了 82.7%和 82.0%的精度,远高于其他预测器所能达到的精度。为了进一步评估,我们在新确定的序列(141 个条目)上测试了我们的方法,获得了 89.6%的精度。对于一组低同源性的蛋白质(40 个条目),SPSSMPred 仍然能够达到 Q3 值为 84.6%。
SPSSMPred 服务器可在 http://cal.tongji.edu.cn/SPSSMPred/ 上使用。