Horne Jesse, Shukla Diwakar
Department of Chemical and Biomolecular Engineering, University of Illinois Urbana-Champaign, Champaign, Illinois 61801, United States.
Department of Chemical and Biomolecular Engineering and Department of Bioengineering, University of Illinois Urbana-Champaign, Champaign, Illinois 61801, United States; Department of Plant Biology, Cancer Center at Illinois, and Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign, Champaign, Illinois 61801, United States.
Ind Eng Chem Res. 2022 May 18;61(19):6235-6245. doi: 10.1021/acs.iecr.1c04943. Epub 2022 Apr 6.
Proteins are Nature's molecular machinery and comprise diverse roles while consisting of chemically similar building blocks. In recent years, protein engineering and design have become important research areas, with many applications in the pharmaceutical, energy, and biocatalysis fields, among others-where the aim is to ultimately create a protein given desired structural and functional properties. It is often critical to model the relationship between a protein's sequence, folded structure, and biological function to assist in such protein engineering pursuits. However, significant challenges remain in concretely mapping an amino acid sequence to specific protein properties and biological activities. Mutations may enhance or diminish molecular protein function, and the epistatic interactions between mutations result in an inherently complex mapping between genetic modifications and protein function. Therefore, estimating the quantitative effects of mutations on protein function(s) remains a grand challenge of biology, bioinformatics, and many related fields and would rapidly accelerate protein engineering tasks when successful. Such estimation is often known as variant effect prediction (VEP). However, progress has been demonstrated in recent years with the development of machine learning (ML) methods in modeling the relationship between mutations and protein function. In this Review, recent advances in variant effect prediction (VEP) are discussed as tools for protein engineering, focusing on techniques incorporating gains from the broader ML community and challenges in estimating biomolecular functional differences. Primary developments highlighted include convolutional neural networks, graph neural networks, and natural language embeddings for protein sequences.
蛋白质是自然界的分子机器,由化学性质相似的构建模块组成,却发挥着多样的作用。近年来,蛋白质工程与设计已成为重要的研究领域,在制药、能源和生物催化等诸多领域有着广泛应用,其目标最终是创造出具有所需结构和功能特性的蛋白质。在这类蛋白质工程研究中,对蛋白质序列、折叠结构和生物学功能之间的关系进行建模往往至关重要。然而,要将氨基酸序列具体映射到特定的蛋白质特性和生物活性上,仍存在重大挑战。突变可能增强或削弱蛋白质分子的功能,而且突变之间的上位相互作用导致了基因修饰与蛋白质功能之间存在固有的复杂映射关系。因此,估计突变对蛋白质功能的定量影响仍然是生物学、生物信息学及许多相关领域面临的重大挑战,一旦成功,将迅速加速蛋白质工程任务。这种估计通常被称为变异效应预测(VEP)。不过,近年来随着机器学习(ML)方法在建模突变与蛋白质功能关系方面的发展,已取得了一定进展。在本综述中,将讨论变异效应预测(VEP)作为蛋白质工程工具的最新进展,重点关注融合了更广泛机器学习领域成果的技术以及估计生物分子功能差异时面临的挑战。突出介绍的主要进展包括用于蛋白质序列的卷积神经网络、图神经网络和自然语言嵌入。