Shi Zhenkun, Liu Pi, Liao Xiaoping, Mao Zhitao, Zhang Jianqi, Wang Qinhong, Sun Jibin, Ma Hongwu, Ma Yanhe
Key Laboratory of Systems Microbial Technology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China.
National Technology Innovation Center of Synthetic Biology, Tianjin 300308China.
Biodes Res. 2022 Jun 15;2022:9898461. doi: 10.34133/2022/9898461. eCollection 2022.
Revolutionary breakthroughs in artificial intelligence (AI) and machine learning (ML) have had a profound impact on a wide range of scientific disciplines, including the development of artificial cell factories for biomanufacturing. In this paper, we review the latest studies on the application of data-driven methods for the design of new proteins, pathways, and strains. We first briefly introduce the various types of data and databases relevant to industrial biomanufacturing, which are the basis for data-driven research. Different types of algorithms, including traditional ML and more recent deep learning methods, are also presented. We then demonstrate how these data-based approaches can be applied to address various issues in cell factory development using examples from recent studies, including the prediction of protein function, improvement of metabolic models, and estimation of missing kinetic parameters, design of non-natural biosynthesis pathways, and pathway optimization. In the last section, we discuss the current limitations of these data-driven approaches and propose that data-driven methods should be integrated with mechanistic models to complement each other and facilitate the development of synthetic strains for industrial biomanufacturing.
人工智能(AI)和机器学习(ML)的革命性突破对广泛的科学学科产生了深远影响,包括用于生物制造的人工细胞工厂的开发。在本文中,我们回顾了关于数据驱动方法在新蛋白质、途径和菌株设计中的应用的最新研究。我们首先简要介绍与工业生物制造相关的各类数据和数据库,它们是数据驱动研究的基础。还介绍了不同类型的算法,包括传统机器学习算法和最新的深度学习方法。然后,我们通过近期研究中的实例展示这些基于数据的方法如何应用于解决细胞工厂开发中的各种问题,包括蛋白质功能预测、代谢模型改进、缺失动力学参数估计、非天然生物合成途径设计以及途径优化。在最后一部分,我们讨论了这些数据驱动方法当前的局限性,并提出应将数据驱动方法与机理模型相结合,以相互补充,促进用于工业生物制造的合成菌株的开发。