Packiam Kulandai Arockia Rajesh, Ooi Chien Wei, Li Fuyi, Mei Shutao, Tey Beng Ti, Ong Huey Fang, Song Jiangning, Ramanan Ramakrishnan Nagasundara
Chemical Engineering Discipline, School of Engineering, Monash University Malaysia, Jalan Lagoon Selatan, 47500 Bandar Sunway, Malaysia.
Advanced Engineering Platform, Monash University Malaysia, Jalan Lagoon Selatan, 47500 Bandar Sunway, Selangor, Malaysia.
Comput Struct Biotechnol J. 2022 Jun 3;20:2909-2920. doi: 10.1016/j.csbj.2022.06.006. eCollection 2022.
Optimization of the fermentation process for recombinant protein production (RPP) is often resource-intensive. Machine learning (ML) approaches are helpful in minimizing the experimentations and find vast applications in RPP. However, these ML-based tools primarily focus on features with respect to amino-acid-sequence, ruling out the influence of fermentation process conditions. The present study combines the features derived from fermentation process conditions with that from amino acid-sequence to construct an ML-based model that predicts the maximal protein yields and the corresponding fermentation conditions for the expression of target recombinant protein in the periplasm. Two sets of XGBoost classifiers were employed in the first stage to classify the expression levels of the target protein as high (>50 mg/L), medium (between 0.5 and 50 mg/L), or low (<0.5 mg/L). The second-stage framework consisted of three regression models involving support vector machines and random forest to predict the expression yields corresponding to each expression-level-class. Independent tests showed that the predictor achieved an overall average accuracy of 75% and a Pearson coefficient correlation of 0.91 for the correctly classified instances. Therefore, our model offers a reliable substitution of numerous trial-and-error experiments to identify the optimal fermentation conditions and yield for RPP. It is also implemented as an open-access webserver, PERISCOPE-Opt (http://periscope-opt.erc.monash.edu).
重组蛋白生产(RPP)发酵过程的优化通常资源消耗大。机器学习(ML)方法有助于减少实验次数,并在RPP中得到广泛应用。然而,这些基于ML的工具主要关注氨基酸序列相关的特征,而忽略了发酵过程条件的影响。本研究将发酵过程条件衍生的特征与氨基酸序列的特征相结合,构建了一个基于ML的模型,该模型可预测周质中目标重组蛋白表达的最大蛋白产量及相应的发酵条件。第一阶段使用两组XGBoost分类器将目标蛋白的表达水平分为高(>50 mg/L)、中(0.5至50 mg/L之间)或低(<0.5 mg/L)。第二阶段框架由三个回归模型组成,涉及支持向量机和随机森林,以预测对应于每个表达水平类别的表达产量。独立测试表明,该预测器对正确分类的实例总体平均准确率达到75%,Pearson系数相关性为0.91。因此,我们的模型为确定RPP的最佳发酵条件和产量提供了一种可靠的替代大量试错实验的方法。它还作为一个开放获取的网络服务器PERISCOPE - Opt(http://periscope-opt.erc.monash.edu)得以实现。