Gangwar Neelesh, Balraj Keerthiveena, Rathore Anurag S
School of Interdisciplinary Research, Indian Institute of Technology, Delhi, New Delhi, 110016, India.
Yardi School of Artificial Intelligence, Indian Institute of Technology, Delhi, New Delhi, 110016, India.
Appl Microbiol Biotechnol. 2024 Apr 24;108(1):308. doi: 10.1007/s00253-024-13147-w.
Cell culture media play a critical role in cell growth and propagation by providing a substrate; media components can also modulate the critical quality attributes (CQAs). However, the inherent complexity of the cell culture media makes unraveling the impact of the various media components on cell growth and CQAs non-trivial. In this study, we demonstrate an end-to-end machine learning framework for media component selection and prediction of CQAs. The preliminary dataset for feature selection was generated by performing CHO-GS (-/-) cell culture in media formulations with varying metal ion concentrations. Acidic and basic charge variant composition of the innovator product (24.97 ± 0.54% acidic and 11.41 ± 1.44% basic) was chosen as the target variable to evaluate the media formulations. Pearson's correlation coefficient and random forest-based techniques were used for feature ranking and feature selection for the prediction of acidic and basic charge variants. Furthermore, a global interpretation analysis using SHapley Additive exPlanations was utilized to select optimal features by evaluating the contributions of each feature in the extracted vectors. Finally, the medium combinations were predicted by employing fifteen different regression models and utilizing a grid search and random search cross-validation for hyperparameter optimization. Experimental results demonstrate that Fe and Zn significantly impact the charge variant profile. This study aims to offer insights that are pertinent to both innovators seeking to establish a complete pipeline for media development and optimization and biosimilar-based manufacturers who strive to demonstrate the analytical and functional biosimilarity of their products to the innovator. KEY POINTS: • Developed a framework for optimizing media components and prediction of CQA. • SHAP enhances global interpretability, aiding informed decision-making. • Fifteen regression models were employed to predict medium combinations.
细胞培养基通过提供底物在细胞生长和增殖中发挥关键作用;培养基成分也可以调节关键质量属性(CQAs)。然而,细胞培养基固有的复杂性使得阐明各种培养基成分对细胞生长和CQAs的影响并非易事。在本研究中,我们展示了一个用于培养基成分选择和CQAs预测的端到端机器学习框架。用于特征选择的初步数据集是通过在具有不同金属离子浓度的培养基配方中进行CHO-GS(-/-)细胞培养而生成的。选择创新产品的酸性和碱性电荷变体组成(24.97±0.54%酸性和11.41±1.44%碱性)作为评估培养基配方的目标变量。使用Pearson相关系数和基于随机森林的技术对酸性和碱性电荷变体的预测进行特征排名和特征选择。此外,利用SHapley Additive exPlanations进行全局解释分析,通过评估提取向量中每个特征的贡献来选择最佳特征。最后,通过采用十五种不同的回归模型并利用网格搜索和随机搜索交叉验证进行超参数优化来预测培养基组合。实验结果表明,铁和锌对电荷变体谱有显著影响。本研究旨在为寻求建立完整的培养基开发和优化流程的创新者以及努力证明其产品与创新者产品在分析和功能上具有生物相似性的生物类似药制造商提供相关见解。要点:• 开发了一个用于优化培养基成分和预测CQAs的框架。• SHAP增强了全局可解释性,有助于做出明智的决策。• 采用十五种回归模型预测培养基组合。