Insitute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, QLD 4222, Australia.
School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China.
Brief Bioinform. 2018 May 1;19(3):482-494. doi: 10.1093/bib/bbw129.
Protein secondary structure prediction began in 1951 when Pauling and Corey predicted helical and sheet conformations for protein polypeptide backbone even before the first protein structure was determined. Sixty-five years later, powerful new methods breathe new life into this field. The highest three-state accuracy without relying on structure templates is now at 82-84%, a number unthinkable just a few years ago. These improvements came from increasingly larger databases of protein sequences and structures for training, the use of template secondary structure information and more powerful deep learning techniques. As we are approaching to the theoretical limit of three-state prediction (88-90%), alternative to secondary structure prediction (prediction of backbone torsion angles and Cα-atom-based angles and torsion angles) not only has more room for further improvement but also allows direct prediction of three-dimensional fragment structures with constantly improved accuracy. About 20% of all 40-residue fragments in a database of 1199 non-redundant proteins have <6 Å root-mean-squared distance from the native conformations by SPIDER2. More powerful deep learning methods with improved capability of capturing long-range interactions begin to emerge as the next generation of techniques for secondary structure prediction. The time has come to finish off the final stretch of the long march towards protein secondary structure prediction.
蛋白质二级结构预测始于 1951 年,当时 Pauling 和 Corey 预测了蛋白质多肽主链的螺旋和片层构象,甚至在第一个蛋白质结构被确定之前。65 年后,强大的新方法为这个领域注入了新的活力。如今,不依赖结构模板的最高三态精度达到了 82-84%,这在几年前是难以想象的。这些改进来自于蛋白质序列和结构数据库的不断扩大,用于训练,利用模板二级结构信息和更强大的深度学习技术。随着我们接近三态预测的理论极限(88-90%),二级结构预测的替代方法(预测主链扭转角和 Cα-原子基角和扭转角)不仅有更大的改进空间,而且允许直接预测三维片段结构,精度不断提高。在一个由 1199 个非冗余蛋白质组成的数据库中,约 20%的 40 残基片段与 SPIDER2 的天然构象的根均方根距离<6Å。具有改进的长程相互作用捕获能力的更强大的深度学习方法开始作为二级结构预测的下一代技术出现。现在是完成蛋白质二级结构预测长征的最后冲刺的时候了。