Institute of Statistics, University of Natural Resources and Life Sciences, Vienna, Austria.
Institute of Bioprocess Science and Engineering, University of Natural Resources and Life Sciences, Vienna, Austria.
Biotechnol J. 2024 Jan;19(2):e2300554. doi: 10.1002/biot.202300554.
The application of model-based real-time monitoring in biopharmaceutical production is a major step toward quality-by-design and the fundament for model predictive control. Data-driven models have proven to be a viable option to model bioprocesses. In the high stakes setting of biopharmaceutical manufacturing it is essential to ensure high model accuracy, robustness, and reliability. That is only possible when (i) the data used for modeling is of high quality and sufficient size, (ii) state-of-the-art modeling algorithms are employed, and (iii) the input-output mapping of the model has been characterized. In this study, we evaluate the accuracy of multiple data-driven models in predicting the monoclonal antibody (mAb) concentration, double stranded DNA concentration, host cell protein concentration, and high molecular weight impurity content during elution from a protein A chromatography capture step. The models achieved high-quality predictions with a normalized root mean squared error of <4% for the mAb concentration and of ≈10% for the other process variables. Furthermore, we demonstrate how permutation/occlusion-based methods can be used to gain an understanding of dependencies learned by one of the most complex data-driven models, convolutional neural network ensembles. We observed that the models generally exhibited dependencies on correlations that agreed with first principles knowledge, thereby bolstering confidence in model reliability. Finally, we present a workflow to assess the model behavior in case of systematic measurement errors that may result from sensor fouling or failure. This study represents a major step toward improved viability of data-driven models in biopharmaceutical manufacturing.
基于模型的实时监测在生物制药生产中的应用是朝着质量源于设计和模型预测控制的基础迈进的重要一步。数据驱动模型已被证明是建模生物过程的一种可行选择。在生物制药制造的高风险环境中,确保模型的高精度、鲁棒性和可靠性至关重要。只有当 (i) 用于建模的数据具有高质量和足够的规模,(ii) 采用最先进的建模算法,以及 (iii) 对模型的输入-输出映射进行了特征描述时,才有可能实现这一点。在这项研究中,我们评估了多种数据驱动模型在预测单克隆抗体 (mAb) 浓度、双链 DNA 浓度、宿主细胞蛋白浓度和洗脱过程中高分子量杂质含量的准确性,这些模型通过归一化均方根误差实现了高质量的预测,mAb 浓度的预测误差小于 4%,其他过程变量的预测误差约为 10%。此外,我们展示了如何使用基于置换/遮挡的方法来了解最复杂的数据驱动模型之一——卷积神经网络集成所学到的依赖关系。我们观察到,模型通常表现出与第一性原理知识一致的相关性依赖关系,从而增强了对模型可靠性的信心。最后,我们提出了一种在可能由于传感器结垢或故障而导致系统测量误差的情况下评估模型行为的工作流程。这项研究是朝着提高生物制药制造中数据驱动模型的可行性迈出的重要一步。