Department of Chemical Engineering and Analytical Science, The University of Manchester, Manchester, UK.
Department of Chemical and Biochemical Engineering, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, China.
Biotechnol Bioeng. 2022 Feb;119(2):411-422. doi: 10.1002/bit.27980. Epub 2021 Nov 8.
Predictive modeling of new biochemical systems with small data is a great challenge. To fill this gap, transfer learning, a subdomain of machine learning that serves to transfer knowledge from a generalized model to a more domain-specific model, provides a promising solution. While transfer learning has been used in natural language processing, image analysis, and chemical engineering fault detection, its application within biochemical engineering has not been systematically explored. In this study, we demonstrated the benefits of transfer learning when applied to predict dynamic behaviors of new biochemical processes. Two different case studies were presented to investigate the accuracy, reliability, and advantage of this innovative modeling approach. We thoroughly discussed the different transfer learning strategies and the effects of topology on transfer learning, comparing the performance of the transfer learning models against benchmark kinetic and data-driven models. Furthermore, strong connections between the underlying process mechanism and the transfer learning model's optimal structure were highlighted, suggesting the interpretability of transfer learning to enable more accurate prediction than a naive data-driven modeling approach. Therefore, this study shows a novel approach to effectively combining data from different resources for bioprocess simulation.
用少量数据对新生化系统进行预测建模是一项巨大的挑战。为了填补这一空白,迁移学习作为机器学习的一个子领域,旨在将知识从通用模型转移到更特定于领域的模型,提供了一个有前途的解决方案。虽然迁移学习已在自然语言处理、图像分析和化工故障检测中得到应用,但它在生化工程中的应用尚未得到系统探索。在这项研究中,我们展示了迁移学习在预测新生化过程动态行为方面的优势。提出了两个不同的案例研究来研究这种创新建模方法的准确性、可靠性和优势。我们深入讨论了不同的迁移学习策略以及拓扑结构对迁移学习的影响,比较了迁移学习模型与基准动力学和数据驱动模型的性能。此外,还强调了基础过程机制与迁移学习模型最佳结构之间的紧密联系,表明迁移学习具有可解释性,可以比简单的数据驱动建模方法进行更准确的预测。因此,本研究展示了一种新颖的方法,可以有效地结合来自不同资源的数据进行生物过程模拟。