Huang Bin, Kong Lupeng, Wang Chao, Ju Fusong, Zhang Qi, Zhu Jianwei, Gong Tiansu, Zhang Haicang, Yu Chungong, Zheng Wei-Mou, Bu Dongbo
Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100049, China.
Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; Changping Laboratory, Beijing 102206, China.
Genomics Proteomics Bioinformatics. 2023 Oct;21(5):913-925. doi: 10.1016/j.gpb.2022.11.014. Epub 2023 Mar 30.
Protein structure prediction is an interdisciplinary research topic that has attracted researchers from multiple fields, including biochemistry, medicine, physics, mathematics, and computer science. These researchers adopt various research paradigms to attack the same structure prediction problem: biochemists and physicists attempt to reveal the principles governing protein folding; mathematicians, especially statisticians, usually start from assuming a probability distribution of protein structures given a target sequence and then find the most likely structure, while computer scientists formulate protein structure prediction as an optimization problem - finding the structural conformation with the lowest energy or minimizing the difference between predicted structure and native structure. These research paradigms fall into the two statistical modeling cultures proposed by Leo Breiman, namely, data modeling and algorithmic modeling. Recently, we have also witnessed the great success of deep learning in protein structure prediction. In this review, we present a survey of the efforts for protein structure prediction. We compare the research paradigms adopted by researchers from different fields, with an emphasis on the shift of research paradigms in the era of deep learning. In short, the algorithmic modeling techniques, especially deep neural networks, have considerably improved the accuracy of protein structure prediction; however, theories interpreting the neural networks and knowledge on protein folding are still highly desired.
蛋白质结构预测是一个跨学科的研究课题,吸引了来自多个领域的研究人员,包括生物化学、医学、物理学、数学和计算机科学。这些研究人员采用各种研究范式来攻克同一个结构预测问题:生物化学家和物理学家试图揭示蛋白质折叠的原理;数学家,尤其是统计学家,通常从给定目标序列假设蛋白质结构的概率分布开始,然后找到最可能的结构,而计算机科学家将蛋白质结构预测表述为一个优化问题——找到能量最低的结构构象或最小化预测结构与天然结构之间的差异。这些研究范式属于利奥·布雷曼提出的两种统计建模文化,即数据建模和算法建模。最近,我们也见证了深度学习在蛋白质结构预测方面取得的巨大成功。在这篇综述中,我们对蛋白质结构预测的相关工作进行了概述。我们比较了不同领域研究人员采用的研究范式,重点关注深度学习时代研究范式的转变。简而言之,算法建模技术,尤其是深度神经网络,已经显著提高了蛋白质结构预测的准确性;然而,解释神经网络的理论以及蛋白质折叠方面的知识仍然非常需要。