• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种处理生物制品制造数据库中缺失数据的启发式方法。

A heuristic approach to handling missing data in biologics manufacturing databases.

机构信息

Pembroke College, Cambridge, UK.

Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, UK.

出版信息

Bioprocess Biosyst Eng. 2019 Apr;42(4):657-663. doi: 10.1007/s00449-018-02059-5. Epub 2019 Jan 8.

DOI:10.1007/s00449-018-02059-5
PMID:30617419
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6430751/
Abstract

The biologics sector has amassed a wealth of data in the past three decades, in line with the bioprocess development and manufacturing guidelines, and analysis of these data with precision is expected to reveal behavioural patterns in cell populations that can be used for making predictions on how future culture processes might behave. The historical bioprocessing data likely comprise experiments conducted using different cell lines, to produce different products and may be years apart; the situation causing inter-batch variability and missing data points to human- and instrument-associated technical oversights. These unavoidable complications necessitate the introduction of a pre-processing step prior to data mining. This study investigated the efficiency of mean imputation and multivariate regression for filling in the missing information in historical bio-manufacturing datasets, and evaluated their performance by symbolic regression models and Bayesian non-parametric models in subsequent data processing. Mean substitution was shown to be a simple and efficient imputation method for relatively smooth, non-dynamical datasets, and regression imputation was effective whilst maintaining the existing standard deviation and shape of the distribution in dynamical datasets with less than 30% missing data. The nature of the missing information, whether Missing Completely At Random, Missing At Random or Missing Not At Random, emerged as the key feature for selecting the imputation method.

摘要

生物制品领域在过去三十年中积累了大量数据,这些数据符合生物工艺开发和制造指南,对这些数据进行精确分析有望揭示细胞群体中的行为模式,可用于预测未来的培养工艺可能如何表现。这些历史生物处理数据可能包含使用不同细胞系进行的实验,以生产不同的产品,并且可能相隔数年;这种情况导致批次间的可变性和数据点缺失,这是人为和仪器相关技术疏忽造成的。这些不可避免的复杂性需要在数据挖掘之前引入预处理步骤。本研究调查了均值插补和多元回归在填补历史生物制造数据集缺失信息方面的效率,并通过符号回归模型和贝叶斯非参数模型在后续数据处理中评估了它们的性能。均值替代被证明是一种相对平滑、非动态数据集的简单高效插补方法,而回归插补在保持分布的现有标准差和形状方面是有效的,对于缺失数据少于 30%的动态数据集也是如此。缺失信息的性质,无论是完全随机缺失、随机缺失还是非随机缺失,成为选择插补方法的关键特征。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/494b/6430751/a872f20c724e/449_2018_2059_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/494b/6430751/a872f20c724e/449_2018_2059_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/494b/6430751/a872f20c724e/449_2018_2059_Fig1_HTML.jpg

相似文献

1
A heuristic approach to handling missing data in biologics manufacturing databases.一种处理生物制品制造数据库中缺失数据的启发式方法。
Bioprocess Biosyst Eng. 2019 Apr;42(4):657-663. doi: 10.1007/s00449-018-02059-5. Epub 2019 Jan 8.
2
Performance of Multiple Imputation Using Modern Machine Learning Methods in Electronic Health Records Data.基于现代机器学习方法在电子健康记录数据中的应用表现。
Epidemiology. 2023 Mar 1;34(2):206-215. doi: 10.1097/EDE.0000000000001578. Epub 2022 Dec 9.
3
Metaheuristic approaches in biopharmaceutical process development data analysis.生物制药工艺开发数据分析中的启发式方法。
Bioprocess Biosyst Eng. 2019 Sep;42(9):1399-1408. doi: 10.1007/s00449-019-02147-0. Epub 2019 May 22.
4
Evaluating the impact of multivariate imputation by MICE in feature selection.评估 MICE 进行多元插补对特征选择的影响。
PLoS One. 2021 Jul 28;16(7):e0254720. doi: 10.1371/journal.pone.0254720. eCollection 2021.
5
Tensor-based methods for handling missing data in quality-of-life questionnaires.基于张量的方法处理生活质量问卷中的缺失数据。
IEEE J Biomed Health Inform. 2014 Sep;18(5):1571-80. doi: 10.1109/JBHI.2013.2288803. Epub 2013 Nov 6.
6
A comparison of multiple imputation methods for handling missing values in longitudinal data in the presence of a time-varying covariate with a non-linear association with time: a simulation study.存在与时间呈非线性关联的时变协变量时,用于处理纵向数据中缺失值的多种多重填补方法的比较:一项模拟研究。
BMC Med Res Methodol. 2017 Jul 25;17(1):114. doi: 10.1186/s12874-017-0372-y.
7
Missing data in the American College of Surgeons National Surgical Quality Improvement Program are not missing at random: implications and potential impact on quality assessments.美国外科医师学会国家手术质量改进计划中的缺失数据并非随机缺失:对质量评估的影响和潜在影响。
J Am Coll Surg. 2010 Feb;210(2):125-139.e2. doi: 10.1016/j.jamcollsurg.2009.10.021.
8
Tools for statistical analysis with missing data: application to a large medical database.处理缺失数据的统计分析工具:应用于大型医学数据库
Stud Health Technol Inform. 2005;116:181-6.
9
Mechanism-aware imputation: a two-step approach in handling missing values in metabolomics.基于机制的插补:代谢组学中处理缺失值的两步法。
BMC Bioinformatics. 2022 May 16;23(1):179. doi: 10.1186/s12859-022-04659-1.
10
Multiple imputation for handling missing outcome data when estimating the relative risk.采用多重插补处理估计相对危险度时丢失的结局数据。
BMC Med Res Methodol. 2017 Sep 6;17(1):134. doi: 10.1186/s12874-017-0414-5.

引用本文的文献

1
Development and validation of a prognostic nomogram for unresectable pancreatic ductal adenocarcinoma with synchronous liver metastases: a study based on the SEER database and an external cohort.不可切除的伴同步肝转移胰腺导管腺癌预后列线图的开发与验证:一项基于监测、流行病学和最终结果(SEER)数据库及外部队列的研究
Front Oncol. 2025 Aug 27;15:1636715. doi: 10.3389/fonc.2025.1636715. eCollection 2025.
2
To Impute or Not To Impute in Untargeted Metabolomics─That is the Compositional Question.非靶向代谢组学中是否进行插补——这就是成分问题。
J Am Soc Mass Spectrom. 2025 Apr 2;36(4):742-759. doi: 10.1021/jasms.4c00434. Epub 2025 Feb 25.
3

本文引用的文献

1
CamOptimus: a tool for exploiting complex adaptive evolution to optimize experiments and processes in biotechnology.CamOptimus:一种利用复杂适应性进化来优化生物技术实验和流程的工具。
Microbiology (Reading). 2017 Jun;163(6):829-839. doi: 10.1099/mic.0.000477. Epub 2017 Jun 21.
2
CLUSTERnGO: a user-defined modelling platform for two-stage clustering of time-series data.CLUSTERnGO:一个用于时间序列数据两阶段聚类的用户定义建模平台。
Bioinformatics. 2016 Feb 1;32(3):388-97. doi: 10.1093/bioinformatics/btv532. Epub 2015 Sep 26.
3
Principled missing data methods for researchers.
Metaheuristic approaches in biopharmaceutical process development data analysis.
生物制药工艺开发数据分析中的启发式方法。
Bioprocess Biosyst Eng. 2019 Sep;42(9):1399-1408. doi: 10.1007/s00449-019-02147-0. Epub 2019 May 22.
面向研究人员的有原则的缺失数据处理方法。
Springerplus. 2013 May 14;2(1):222. doi: 10.1186/2193-1801-2-222. Print 2013 Dec.
4
The prevention and handling of the missing data.数据缺失的预防和处理。
Korean J Anesthesiol. 2013 May;64(5):402-6. doi: 10.4097/kjae.2013.64.5.402. Epub 2013 May 24.
5
Multiple imputation: dealing with missing data.多重插补:处理缺失数据。
Nephrol Dial Transplant. 2013 Oct;28(10):2415-20. doi: 10.1093/ndt/gft221. Epub 2013 May 31.
6
The prevention and treatment of missing data in clinical trials.临床试验中缺失数据的预防与处理
N Engl J Med. 2012 Oct 4;367(14):1355-60. doi: 10.1056/NEJMsr1203730.
7
Multiple imputation with large data sets: a case study of the Children's Mental Health Initiative.大数据集的多重填补:儿童心理健康倡议的案例研究
Am J Epidemiol. 2009 May 1;169(9):1133-9. doi: 10.1093/aje/kwp026. Epub 2009 Mar 24.
8
Much ado about nothing: A comparison of missing data methods and software to fit incomplete data regression models.无事生非:缺失数据方法与拟合不完全数据回归模型软件的比较
Am Stat. 2007 Feb;61(1):79-90. doi: 10.1198/000313007X172556.