Qiu Yuchi, Wei Guo-Wei
Department of Mathematics, Michigan State University, East Lansing, 48824, MI, USA.
Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, 48824, MI, USA.
ArXiv. 2023 Jul 27:arXiv:2307.14587v1.
Protein engineering is an emerging field in biotechnology that has the potential to revolutionize various areas, such as antibody design, drug discovery, food security, ecology, and more. However, the mutational space involved is too vast to be handled through experimental means alone. Leveraging accumulative protein databases, machine learning (ML) models, particularly those based on natural language processing (NLP), have considerably expedited protein engineering. Moreover, advances in topological data analysis (TDA) and artificial intelligence-based protein structure prediction, such as AlphaFold2, have made more powerful structure-based ML-assisted protein engineering strategies possible. This review aims to offer a comprehensive, systematic, and indispensable set of methodological components, including TDA and NLP, for protein engineering and to facilitate their future development.
蛋白质工程是生物技术领域中一个新兴的领域,它有潜力彻底改变各个领域,如抗体设计、药物发现、食品安全、生态学等等。然而,所涉及的突变空间过于庞大,无法仅通过实验手段来处理。利用积累的蛋白质数据库,机器学习(ML)模型,特别是基于自然语言处理(NLP)的模型,极大地加速了蛋白质工程。此外,拓扑数据分析(TDA)和基于人工智能的蛋白质结构预测(如AlphaFold2)的进展,使得更强大的基于结构的ML辅助蛋白质工程策略成为可能。本综述旨在为蛋白质工程提供一套全面、系统且不可或缺的方法组件,包括TDA和NLP,并促进它们未来的发展。