Zhang Liang, Tan Pan, Hong Liang
School of Physics and Astronomy, Shanghai Jiao Tong University, Shanghai 200240, China.
Shanghai National Center for Applied Mathematics (SJTU Center) & Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai 200240, China.
Sheng Wu Gong Cheng Xue Bao. 2025 Mar 25;41(3):934-948. doi: 10.13345/j.cjb.240683.
Predicting protein mutation effects is a key challenge in bioinformatics and protein engineering. Recent advancements in deep learning, particularly the development of protein language models (PLMs), have brought new opportunities to this field. This review summarizes the application of PLMs in predicting protein mutation effects, focusing on three main types of models: sequence-based models, structure-based models, and models that combine sequence and structural information. We analyze in detail the principles, advantages, and limitations of these models and discuss the application of unsupervised and supervised learning in model training. Furthermore, this paper discusses the main challenges currently faced, including the acquisition of high-quality datasets and the handling of data noise. Finally, we look ahead to future research directions, including the application prospects of emerging technologies such as multimodal fusion and few-shot learning. This review aims to provide researchers with a comprehensive perspective to further advance the prediction of protein mutation effects.
预测蛋白质突变效应是生物信息学和蛋白质工程中的一项关键挑战。深度学习的最新进展,特别是蛋白质语言模型(PLM)的发展,为该领域带来了新的机遇。本综述总结了PLM在预测蛋白质突变效应中的应用,重点关注三种主要类型的模型:基于序列的模型、基于结构的模型以及结合序列和结构信息的模型。我们详细分析了这些模型的原理、优点和局限性,并讨论了无监督学习和监督学习在模型训练中的应用。此外,本文还讨论了当前面临的主要挑战,包括高质量数据集的获取和数据噪声的处理。最后,我们展望了未来的研究方向,包括多模态融合和少样本学习等新兴技术的应用前景。本综述旨在为研究人员提供一个全面的视角,以进一步推进蛋白质突变效应的预测。