Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan.
Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan.
Biomolecules. 2021 Nov 3;11(11):1627. doi: 10.3390/biom11111627.
Secondary structure prediction (SSP) of proteins is an important structural biology technique with many applications. There have been ~300 algorithms published in the past seven decades with fierce competition in accuracy. In the first 60 years, the accuracy of three-state SSP rose from ~56% to 81%; after that, it has long stayed at 81-86%. In the 1990s, the theoretical limit of three-state SSP accuracy had been estimated to be 88%. Thus, SSP is now generally considered not challenging or too challenging to improve. However, we found that the limit of three-state SSP might be underestimated. Besides, there is still much room for improving segment-based and eight-state SSPs, but the limits of these emerging topics have not been determined. This work performs large-scale sequence and structural analyses to estimate SSP accuracy limits and assess state-of-the-art SSP methods. The limit of three-state SSP is re-estimated to be ~92%, 4-5% higher than previously expected, indicating that SSP is still challenging. The estimated limit of eight-state SSP is 84-87%. Several proposals for improving future SSP algorithms are made based on our results. We hope that these findings will help move forward the development of SSP and all its applications.
蛋白质二级结构预测(SSP)是一项具有广泛应用的重要结构生物学技术。在过去的七十年中,已经有300 种算法被发表,它们在准确性方面竞争激烈。在前 60 年中,三态 SSP 的准确性从56%提高到 81%;此后,它一直稳定在 81-86%。在 20 世纪 90 年代,三态 SSP 准确性的理论极限已被估计为 88%。因此,目前普遍认为 SSP 要么没有挑战性,要么挑战性太大而无法提高。然而,我们发现三态 SSP 的极限可能被低估了。此外,基于片段的和八态 SSP 仍有很大的改进空间,但这些新兴主题的极限尚未确定。这项工作通过大规模的序列和结构分析来估计 SSP 准确性极限和评估最先进的 SSP 方法。三态 SSP 的极限被重新估计为~92%,比之前预期的高出 4-5%,表明 SSP 仍然具有挑战性。八态 SSP 的估计极限为 84-87%。根据我们的结果,提出了一些改进未来 SSP 算法的建议。我们希望这些发现将有助于推动 SSP 及其所有应用的发展。