Williams Bianca, Löbel Wiebke, Finklea Ferdous, Halloin Caroline, Ritzenhoff Katharina, Manstein Felix, Mohammadi Samira, Hashemi Mohammadjafar, Zweigerdt Robert, Lipke Elizabeth, Cremaschi Selen
Department of Chemical Engineering, Auburn University, Auburn, AL, United States.
Leibniz Research Laboratories for Biotechnology and Artificial Organs (LEBAO), Department of Cardiothoracic, Transplantation and Vascular Surgery, Hannover Medical School, Hanover, Germany.
Front Bioeng Biotechnol. 2020 Jul 23;8:851. doi: 10.3389/fbioe.2020.00851. eCollection 2020.
Human cardiomyocytes (CMs) have potential for use in therapeutic cell therapy and high-throughput drug screening. Because of the inability to expand adult CMs, their large-scale production from human pluripotent stem cells (hPSC) has been suggested. Significant improvements have been made in understanding directed differentiation processes of CMs from hPSCs and their suspension culture-based production at chemically defined conditions. However, optimization experiments are costly, time-consuming, and highly variable, leading to challenges in developing reliable and consistent protocols for the generation of large CM numbers at high purity. This study examined the ability of data-driven modeling with machine learning for identifying key experimental conditions and predicting final CM content using data collected during hPSC-cardiac differentiation in advanced stirred tank bioreactors (STBRs). Through feature selection, we identified process conditions, features, and patterns that are the most influential on and predictive of the CM content at the process endpoint, on differentiation day 10 (dd10). Process-related features were extracted from experimental data collected from 58 differentiation experiments by feature engineering. These features included data continuously collected online by the bioreactor system, such as dissolved oxygen concentration and pH patterns, as well as offline determined data, including the cell density, cell aggregate size, and nutrient concentrations. The selected features were used as inputs to construct models to classify the resulting CM content as being "" or "" regarding pre-defined thresholds. The models built using random forests and Gaussian process modeling predicted CM content for a differentiation process with 90% accuracy and precision on dd7 of the protocol and with 85% accuracy and 82% precision at a substantially earlier stage: dd5. These models provide insight into potential key factors affecting hPSC cardiac differentiation to aid in selecting future experimental conditions and can predict the final CM content at earlier process timepoints, providing cost and time savings. This study suggests that data-driven models and machine learning techniques can be employed using existing data for understanding and improving production of a specific cell type, which is potentially applicable to other lineages and critical for realization of their therapeutic applications.
人类心肌细胞(CMs)具有用于治疗性细胞疗法和高通量药物筛选的潜力。由于无法扩增成人CMs,因此有人建议从人类多能干细胞(hPSC)大规模生产CMs。在理解hPSC来源的CMs的定向分化过程及其在化学限定条件下基于悬浮培养的生产方面已经取得了显著进展。然而,优化实验成本高、耗时且高度可变,这给开发可靠且一致的方案以高纯度生成大量CM带来了挑战。本研究考察了使用机器学习进行数据驱动建模的能力,以识别关键实验条件,并利用在先进搅拌罐生物反应器(STBRs)中hPSC向心脏分化过程中收集的数据预测最终CM含量。通过特征选择,我们确定了对分化第10天(dd10)的过程终点时的CM含量最具影响力且可预测的过程条件、特征和模式。通过特征工程从58个分化实验收集的实验数据中提取了与过程相关的特征。这些特征包括生物反应器系统在线连续收集的数据,如溶解氧浓度和pH模式,以及离线测定的数据,包括细胞密度、细胞聚集体大小和营养物浓度。所选特征用作输入来构建模型,以根据预定义阈值将所得CM含量分类为“高”或“低”。使用随机森林和高斯过程建模构建的模型在方案的dd7时预测分化过程的CM含量,准确率和精确率为90%,在更早阶段(dd5)准确率为85%,精确率为82%。这些模型深入了解了影响hPSC心脏分化的潜在关键因素,有助于选择未来的实验条件,并可在更早的过程时间点预测最终CM含量,从而节省成本和时间。本研究表明,数据驱动模型和机器学习技术可利用现有数据来理解和改进特定细胞类型的生产,这可能适用于其他细胞谱系,对实现其治疗应用至关重要。