Alanazi Wafa, Meng Di, Pollastri Gianluca
School of Computer Science, University College Dublin, Belfield, Dublin D04 C1P1, Ireland.
Department of Computer Science, College of Science, Northern Border University, Arar, Saudi Arabia.
Comput Struct Biotechnol J. 2025 Apr 3;27:1416-1430. doi: 10.1016/j.csbj.2025.04.005. eCollection 2025.
The accurate prediction of protein structures remains a cornerstone challenge in structural bioinformatics, essential for understanding the intricate relationship between protein sequence, structure, and function. Recent advancements in Machine Learning (ML) and Deep Learning (DL) have revolutionized this field, offering innovative approaches to tackle one- dimensional (1D) protein structure annotations, including secondary structure, solvent accessibility, and intrinsic disorder. This review highlights the evolution of predictive methodologies, from early machine learning models to sophisticated deep learning frameworks that integrate sequence embeddings and pretrained language models. Key advancements, such as AlphaFold's transformative impact on structure prediction and the rise of protein language models (PLMs), have enabled unprecedented accuracy in capturing sequence-structure relationships. Furthermore, we explore the role of specialized datasets, benchmarking competitions, and multimodal integration in shaping state-of-the-art prediction models. By addressing challenges in data quality, scalability, interpretability, and task-specific optimization, this review underscores the transformative impact of ML, DL, and PLMs on 1D protein prediction while providing insights into emerging trends and future directions in this rapidly evolving field.
蛋白质结构的准确预测仍然是结构生物信息学中的一项核心挑战,对于理解蛋白质序列、结构和功能之间的复杂关系至关重要。机器学习(ML)和深度学习(DL)的最新进展彻底改变了这一领域,为解决一维(1D)蛋白质结构注释问题提供了创新方法,包括二级结构、溶剂可及性和内在无序性。本综述重点介绍了预测方法的演变,从早期的机器学习模型到集成序列嵌入和预训练语言模型的复杂深度学习框架。关键进展,如AlphaFold对结构预测的变革性影响以及蛋白质语言模型(PLM)的兴起,在捕捉序列-结构关系方面实现了前所未有的准确性。此外,我们探讨了专门数据集、基准测试竞赛和多模态整合在塑造先进预测模型中的作用。通过应对数据质量、可扩展性、可解释性和特定任务优化方面的挑战,本综述强调了ML、DL和PLM对一维蛋白质预测的变革性影响,同时提供了对这一快速发展领域新兴趋势和未来方向的见解。