Hie Brian L, Yang Kevin K, Kim Peter S
Department of Biochemistry, Stanford University School of Medicine, Stanford, CA 94305, USA; Stanford ChEM-H, Stanford University, Stanford, CA 94305, USA.
Microsoft Research New England, Cambridge, MA 02142, USA.
Cell Syst. 2022 Apr 20;13(4):274-285.e6. doi: 10.1016/j.cels.2022.01.003. Epub 2022 Feb 3.
The degree to which evolution is predictable is a fundamental question in biology. Previous attempts to predict the evolution of protein sequences have been limited to specific proteins and to small changes, such as single-residue mutations. Here, we demonstrate that by using a protein language model to predict the local evolution within protein families, we recover a dynamic "vector field" of protein evolution that we call evolutionary velocity (evo-velocity). Evo-velocity generalizes to evolution over vastly different timescales, from viral proteins evolving over years to eukaryotic proteins evolving over geologic eons, and can predict the evolutionary dynamics of proteins that were not used to develop the original model. Evo-velocity also yields new evolutionary insights by predicting strategies of viral-host immune escape, resolving conflicting theories on the evolution of serpins, and revealing a key role of horizontal gene transfer in the evolution of eukaryotic glycolysis.
进化在多大程度上是可预测的,这是生物学中的一个基本问题。先前预测蛋白质序列进化的尝试仅限于特定蛋白质和小的变化,例如单残基突变。在这里,我们证明,通过使用蛋白质语言模型来预测蛋白质家族内的局部进化,我们恢复了一个蛋白质进化的动态“向量场”,我们将其称为进化速度(evo-速度)。Evo-速度可以推广到截然不同的时间尺度上的进化,从数年进化的病毒蛋白到地质年代进化的真核生物蛋白,并且可以预测未用于开发原始模型的蛋白质的进化动态。Evo-速度还通过预测病毒-宿主免疫逃逸策略、解决关于丝氨酸蛋白酶抑制剂进化的相互冲突的理论以及揭示水平基因转移在真核生物糖酵解进化中的关键作用,产生了新的进化见解。