Department of Computer and Information Sciences, University of Delaware, Newark, Delaware, United States of America.
Center for Bioinformatics and Computational Biology, University of Delaware, Newark, Delaware, United States of America.
PLoS One. 2023 Apr 6;18(4):e0274042. doi: 10.1371/journal.pone.0274042. eCollection 2023.
Chinese hamster ovary (CHO) cells are widely used for mass production of therapeutic proteins in the pharmaceutical industry. With the growing need in optimizing the performance of producer CHO cell lines, research on CHO cell line development and bioprocess continues to increase in recent decades. Bibliographic mapping and classification of relevant research studies will be essential for identifying research gaps and trends in literature. To qualitatively and quantitatively understand the CHO literature, we have conducted topic modeling using a CHO bioprocess bibliome manually compiled in 2016, and compared the topics uncovered by the Latent Dirichlet Allocation (LDA) models with the human labels of the CHO bibliome. The results show a significant overlap between the manually selected categories and computationally generated topics, and reveal the machine-generated topic-specific characteristics. To identify relevant CHO bioprocessing papers from new scientific literature, we have developed supervized models using Logistic Regression to identify specific article topics and evaluated the results using three CHO bibliome datasets, Bioprocessing set, Glycosylation set, and Phenotype set. The use of top terms as features supports the explainability of document classification results to yield insights on new CHO bioprocessing papers.
中国仓鼠卵巢(CHO)细胞被广泛用于制药行业的大量生产治疗性蛋白。随着对优化生产用 CHO 细胞系性能的需求不断增长,近年来 CHO 细胞系开发和生物工艺的研究持续增加。文献综述和相关研究的分类对于确定文献中的研究空白和趋势至关重要。为了定性和定量地理解 CHO 文献,我们使用 2016 年手动编制的 CHO 生物工艺生物组进行了主题建模,并将潜在狄利克雷分配(LDA)模型发现的主题与 CHO 生物组的人工标签进行了比较。结果表明,手动选择的类别和计算生成的主题之间存在显著重叠,并揭示了机器生成的主题特定特征。为了从新的科学文献中识别相关的 CHO 生物加工论文,我们使用逻辑回归开发了监督模型来识别特定的文章主题,并使用三个 CHO 生物组数据集(生物处理集、糖基化集和表型集)来评估结果。使用顶级术语作为特征支持文档分类结果的可解释性,从而为新的 CHO 生物加工论文提供见解。