Patra Pradipta, B R Disha, Kundu Pritam, Das Manali, Ghosh Amit
School School of Energy Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal 721302, India.
B.M.S College of Engineering, Basavanagudi, Bengaluru, Karnataka 560019, India.
Biotechnol Adv. 2023 Jan-Feb;62:108069. doi: 10.1016/j.biotechadv.2022.108069. Epub 2022 Nov 25.
Metabolic engineering encompasses several widely-used strategies, which currently hold a high seat in the field of biotechnology when its potential is manifesting through a plethora of research and commercial products with a strong societal impact. The genomic revolution that occurred almost three decades ago has initiated the generation of large omics-datasets which has helped in gaining a better understanding of cellular behavior. The itinerary of metabolic engineering that has occurred based on these large datasets has allowed researchers to gain detailed insights and a reasonable understanding of the intricacies of biosystems. However, the existing trail-and-error approaches for metabolic engineering are laborious and time-intensive when it comes to the production of target compounds with high yields through genetic manipulations in host organisms. Machine learning (ML) coupled with the available metabolic engineering test instances and omics data brings a comprehensive and multidisciplinary approach that enables scientists to evaluate various parameters for effective strain design. This vast amount of biological data should be standardized through knowledge engineering to train different ML models for providing accurate predictions in gene circuits designing, modification of proteins, optimization of bioprocess parameters for scaling up, and screening of hyper-producing robust cell factories. This review briefs on the premise of ML, followed by mentioning various ML methods and algorithms alongside the numerous omics datasets available to train ML models for predicting metabolic outcomes with high-accuracy. The combinative interplay between the ML algorithms and biological datasets through knowledge engineering have guided the recent advancements in applications such as CRISPR/Cas systems, gene circuits, protein engineering, metabolic pathway reconstruction, and bioprocess engineering. Finally, this review addresses the probable challenges of applying ML in metabolic engineering which will guide the researchers toward novel techniques to overcome the limitations.
代谢工程涵盖了几种广泛使用的策略,当这些策略的潜力通过大量具有重大社会影响的研究和商业产品得以体现时,它们目前在生物技术领域占据着重要地位。大约三十年前发生的基因组革命开启了大量组学数据集的生成,这有助于更好地理解细胞行为。基于这些大型数据集所开展的代谢工程历程,使研究人员能够深入洞察并合理理解生物系统的复杂性。然而,就通过宿主生物体中的基因操作高产生产目标化合物而言,现有的代谢工程试错方法既费力又耗时。机器学习(ML)与可用的代谢工程测试实例和组学数据相结合,带来了一种全面的多学科方法,使科学家能够评估各种参数以进行有效的菌株设计。这些海量的生物数据应通过知识工程进行标准化,以训练不同的ML模型,从而在基因电路设计、蛋白质修饰、扩大规模的生物过程参数优化以及高产稳健细胞工厂的筛选中提供准确的预测。本综述首先简述了ML的前提,接着提及了各种ML方法和算法,以及可用于训练ML模型以高精度预测代谢结果的众多组学数据集。通过知识工程,ML算法与生物数据集之间的联合相互作用推动了CRISPR/Cas系统、基因电路、蛋白质工程、代谢途径重建和生物过程工程等应用领域的最新进展。最后,本综述探讨了在代谢工程中应用ML可能面临的挑战,这将引导研究人员采用新技术来克服这些限制。