School of Computer Science, University College Dublin, Belfield, Dublin 4, Ireland.
Sci Rep. 2019 Aug 26;9(1):12374. doi: 10.1038/s41598-019-48786-x.
Protein Secondary Structure prediction has been a central topic of research in Bioinformatics for decades. In spite of this, even the most sophisticated ab initio SS predictors are not able to reach the theoretical limit of three-state prediction accuracy (88-90%), while only a few predict more than the 3 traditional Helix, Strand and Coil classes. In this study we present tests on different models trained both on single sequence and evolutionary profile-based inputs and develop a new state-of-the-art system with Porter 5. Porter 5 is composed of ensembles of cascaded Bidirectional Recurrent Neural Networks and Convolutional Neural Networks, incorporates new input encoding techniques and is trained on a large set of protein structures. Porter 5 achieves 84% accuracy (81% SOV) when tested on 3 classes and 73% accuracy (70% SOV) on 8 classes on a large independent set. In our tests Porter 5 is 2% more accurate than its previous version and outperforms or matches the most recent predictors of secondary structure we tested. When Porter 5 is retrained on SCOPe based sets that eliminate homology between training/testing samples we obtain similar results. Porter is available as a web server and standalone program at http://distilldeep.ucd.ie/porter/ alongside all the datasets and alignments.
蛋白质二级结构预测是生物信息学研究的核心课题之一。尽管如此,即使是最复杂的从头预测 SS 也无法达到理论上的三态预测精度极限(88-90%),而只有少数预测方法可以预测出超过 3 种传统的螺旋、链和卷曲类别。在这项研究中,我们对基于单序列和基于进化轮廓的输入进行训练的不同模型进行了测试,并开发了一个新的基于 Porter 5 的最先进系统。Porter 5 由级联的双向递归神经网络和卷积神经网络的集合组成,采用了新的输入编码技术,并在大量蛋白质结构数据集上进行了训练。当在一个大型独立数据集上测试 3 个类别时,Porter 5 的准确率达到 84%(81% SOV),当测试 8 个类别时,准确率达到 73%(70% SOV)。在我们的测试中,Porter 5 比其前一版本精确了 2%,并且优于或匹配我们测试的最近的二级结构预测器。当 Porter 5 在基于 SCOPe 的数据集上进行重新训练时,该数据集消除了训练/测试样本之间的同源性,我们得到了类似的结果。Porter 可作为一个网络服务器和独立程序,在 http://distilldeep.ucd.ie/porter/上提供,同时提供所有数据集和比对。