• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于聚类和过采样的循环神经网络在临床试验中缺失数据插补。

Missing data imputation in clinical trials using recurrent neural network facilitated by clustering and oversampling.

机构信息

Institute for Medical Information Processing, Biometry and Epidemiology (IBE), LMU Munich, Munich, Germany.

Alvotech Germany GmbH, Jülich, Germany.

出版信息

Biom J. 2022 Jun;64(5):863-882. doi: 10.1002/bimj.202000393. Epub 2022 Mar 10.

DOI:10.1002/bimj.202000393
PMID:35266565
Abstract

In clinical practice, the composition of missing data may be complex, for example, a mixture of missing at random (MAR) and missing not at random (MNAR) assumptions. Many methods under the assumption of MAR are available. Under the assumption of MNAR, likelihood-based methods require specification of the joint distribution of the data, and the missingness mechanism has been introduced as sensitivity analysis. These classic models heavily rely on the underlying assumption, and, in many realistic scenarios, they can produce unreliable estimates. In this paper, we develop a machine learning based missing data prediction framework with the aim of handling more realistic missing data scenarios. We use an imbalanced learning technique (i.e., oversampling of minority class) to handle the MNAR data. To implement oversampling in longitudinal continuous variable, we first perform clustering via -mean trajectories. And use the recurrent neural network (RNN) to model the longitudinal data. Further, we apply bootstrap aggregating to improve the accuracy of prediction and also to consider the uncertainty of a single prediction. We evaluate the proposed method using simulated data. The prediction result is evaluated at the individual patient level and the overall population level. We demonstrate the powerful predictive capability of RNN for longitudinal data and its flexibility for nonlinear modeling. Overall, the proposed method provides an accurate individual prediction for both MAR and MNAR data and reduce the bias of missing data in treatment effect estimation when compared to standard methods and classic models. Finally, we implement the proposed method in a real dataset from an antidepressant clinical trial. In summary, this paper offers an opportunity to encourage the integration of machine learning strategies for handling of missing data in the analysis of randomized clinical trials.

摘要

在临床实践中,缺失数据的构成可能很复杂,例如,混合了随机缺失(MAR)和非随机缺失(MNAR)假设。许多 MAR 假设下的方法都是可用的。在 MNAR 假设下,基于似然的方法需要指定数据的联合分布,并且已经将缺失机制作为敏感性分析引入。这些经典模型严重依赖于基本假设,并且在许多实际情况下,它们可能会产生不可靠的估计。在本文中,我们开发了一个基于机器学习的缺失数据预测框架,旨在处理更现实的缺失数据场景。我们使用不平衡学习技术(即少数类别的过采样)来处理 MNAR 数据。为了在纵向连续变量中执行过采样,我们首先通过 -mean 轨迹进行聚类。并使用递归神经网络(RNN)来对纵向数据进行建模。此外,我们应用引导聚合来提高预测的准确性,并考虑单个预测的不确定性。我们使用模拟数据评估所提出的方法。预测结果在个体患者水平和总体人群水平上进行评估。我们展示了 RNN 对纵向数据的强大预测能力及其对非线性建模的灵活性。总体而言,与标准方法和经典模型相比,所提出的方法为 MAR 和 MNAR 数据提供了准确的个体预测,并减少了缺失数据对治疗效果估计的偏差。最后,我们在抗抑郁药临床试验的真实数据集上实现了所提出的方法。总之,本文为鼓励将机器学习策略整合到随机临床试验的缺失数据分析中提供了机会。

相似文献

1
Missing data imputation in clinical trials using recurrent neural network facilitated by clustering and oversampling.基于聚类和过采样的循环神经网络在临床试验中缺失数据插补。
Biom J. 2022 Jun;64(5):863-882. doi: 10.1002/bimj.202000393. Epub 2022 Mar 10.
2
A hybrid return to baseline imputation method to incorporate MAR and MNAR dropout missingness.一种混合的回归到基线填补方法,用于纳入 MAR 和 MNAR 缺失。
Contemp Clin Trials. 2022 Sep;120:106859. doi: 10.1016/j.cct.2022.106859. Epub 2022 Jul 21.
3
Heckman imputation models for binary or continuous MNAR outcomes and MAR predictors.Heckman 插补模型用于二分类或连续 MNAR 结局和 MAR 预测因子。
BMC Med Res Methodol. 2018 Aug 31;18(1):90. doi: 10.1186/s12874-018-0547-1.
4
Missing data imputation using utility-based regression and sampling approaches.基于效用的回归和抽样方法进行缺失数据插补。
Comput Methods Programs Biomed. 2022 Nov;226:107172. doi: 10.1016/j.cmpb.2022.107172. Epub 2022 Oct 3.
5
Multiple imputation methods for handling missing data in cost-effectiveness analyses that use data from hierarchical studies: an application to cluster randomized trials.多变量填补方法在使用来自层级研究数据的成本效益分析中处理缺失数据:一项对整群随机试验的应用。
Med Decis Making. 2013 Nov;33(8):1051-63. doi: 10.1177/0272989X13492203. Epub 2013 Aug 1.
6
Data Missing Not at Random in Mobile Health Research: Assessment of the Problem and a Case for Sensitivity Analyses.移动健康研究中的数据缺失非随机:问题评估与敏感性分析案例。
J Med Internet Res. 2021 Jun 15;23(6):e26749. doi: 10.2196/26749.
7
Missing not at random models for latent growth curve analyses.缺失非随机模型在潜在增长曲线分析中的应用。
Psychol Methods. 2011 Mar;16(1):1-16. doi: 10.1037/a0022640.
8
A Realistic Evaluation of Methods for Handling Missing Data When There is a Mixture of MCAR, MAR, and MNAR Mechanisms in the Same Dataset.当同一数据集中存在MCAR、MAR和MNAR机制混合时处理缺失数据方法的现实评估
Multivariate Behav Res. 2023 Sep-Oct;58(5):988-1013. doi: 10.1080/00273171.2022.2158776. Epub 2023 Jan 4.
9
Evaluation of a weighting approach for performing sensitivity analysis after multiple imputation.多重填补后进行敏感性分析的加权方法评估。
BMC Med Res Methodol. 2015 Oct 13;15:83. doi: 10.1186/s12874-015-0074-2.
10
Approaches for missing covariate data in logistic regression with MNAR sensitivity analyses.具有 MAR 敏感性分析的逻辑回归中缺失协变量数据的处理方法。
Biom J. 2020 Jul;62(4):1025-1037. doi: 10.1002/bimj.201900117. Epub 2020 Jan 20.

引用本文的文献

1
Advances in analytical approaches for background parenchymal enhancement in predicting breast tumor response to neoadjuvant chemotherapy: A systematic review.用于预测乳腺肿瘤对新辅助化疗反应的背景实质强化分析方法的进展:一项系统综述。
PLoS One. 2025 Mar 7;20(3):e0317240. doi: 10.1371/journal.pone.0317240. eCollection 2025.
2
Comprehensive implementations of multiple imputation using retrieved dropouts for continuous endpoints.使用检索到的失访数据对连续终点进行多重填补的综合实施方法。
BMC Med Res Methodol. 2025 Feb 21;25(1):47. doi: 10.1186/s12874-025-02494-5.
3
Enhancement Methods of Hydropower Unit Monitoring Data Quality Based on the Hierarchical Density-Based Spatial Clustering of Applications with a Noise-Wasserstein Slim Generative Adversarial Imputation Network with a Gradient Penalty.
基于带梯度惩罚的噪声-瓦瑟斯坦精简生成对抗插补网络的分层密度空间聚类的水电设备监测数据质量增强方法
Sensors (Basel). 2023 Dec 25;24(1):118. doi: 10.3390/s24010118.
4
A randomized wait-list controlled trial to investigate the role of cognitive mechanisms in parenting interventions on mothers with substance use disorder.一项随机等待名单对照试验,旨在研究认知机制在针对物质使用障碍母亲的养育干预中的作用。
Trials. 2022 Jul 23;23(1):588. doi: 10.1186/s13063-022-06420-8.