• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

训练数据集中存在缺失数据时总生存预后模型的开发与验证:一个详细示例的策略

The development and validation of prognostic models for overall survival in the presence of missing data in the training dataset: a strategy with a detailed example.

作者信息

Royle Kara-Louise, Cairns David A

机构信息

Clinical Trials Research Unit, Leeds Institute of Clinical Trials Research, University of Leeds, Leeds, UK.

出版信息

Diagn Progn Res. 2021 Aug 4;5(1):14. doi: 10.1186/s41512-021-00103-9.

DOI:10.1186/s41512-021-00103-9
PMID:34344484
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8335879/
Abstract

BACKGROUND

The United Kingdom Myeloma Research Alliance (UK-MRA) Myeloma Risk Profile is a prognostic model for overall survival. It was trained and tested on clinical trial data, aiming to improve the stratification of transplant ineligible (TNE) patients with newly diagnosed multiple myeloma. Missing data is a common problem which affects the development and validation of prognostic models, where decisions on how to address missingness have implications on the choice of methodology.

METHODS

Model building The training and test datasets were the TNE pathways from two large randomised multicentre, phase III clinical trials. Potential prognostic factors were identified by expert opinion. Missing data in the training dataset was imputed using multiple imputation by chained equations. Univariate analysis fitted Cox proportional hazards models in each imputed dataset with the estimates combined by Rubin's rules. Multivariable analysis applied penalised Cox regression models, with a fixed penalty term across the imputed datasets. The estimates from each imputed dataset and bootstrap standard errors were combined by Rubin's rules to define the prognostic model. Model assessment Calibration was assessed by visualising the observed and predicted probabilities across the imputed datasets. Discrimination was assessed by combining the prognostic separation D-statistic from each imputed dataset by Rubin's rules. Model validation The D-statistic was applied in a bootstrap internal validation process in the training dataset and an external validation process in the test dataset, where acceptable performance was pre-specified. Development of risk groups Risk groups were defined using the tertiles of the combined prognostic index, obtained by combining the prognostic index from each imputed dataset by Rubin's rules.

RESULTS

The training dataset included 1852 patients, 1268 (68.47%) with complete case data. Ten imputed datasets were generated. Five hundred twenty patients were included in the test dataset. The D-statistic for the prognostic model was 0.840 (95% CI 0.716-0.964) in the training dataset and 0.654 (95% CI 0.497-0.811) in the test dataset and the corrected D-Statistic was 0.801.

CONCLUSION

The decision to impute missing covariate data in the training dataset influenced the methods implemented to train and test the model. To extend current literature and aid future researchers, we have presented a detailed example of one approach. Whilst our example is not without limitations, a benefit is that all of the patient information available in the training dataset was utilised to develop the model.

TRIAL REGISTRATION

Both trials were registered; Myeloma IX- ISRCTN68454111 , registered 21 September 2000. Myeloma XI- ISRCTN49407852 , registered 24 June 2009.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9b10/8335879/bd21398103fc/41512_2021_103_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9b10/8335879/228784afd1cb/41512_2021_103_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9b10/8335879/b1e5be589ca7/41512_2021_103_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9b10/8335879/6db43024e148/41512_2021_103_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9b10/8335879/bd21398103fc/41512_2021_103_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9b10/8335879/228784afd1cb/41512_2021_103_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9b10/8335879/b1e5be589ca7/41512_2021_103_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9b10/8335879/6db43024e148/41512_2021_103_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9b10/8335879/bd21398103fc/41512_2021_103_Fig4_HTML.jpg
摘要

背景

英国骨髓瘤研究联盟(UK-MRA)骨髓瘤风险概况是一种总体生存预后模型。它在临床试验数据上进行了训练和测试,旨在改善新诊断的多发性骨髓瘤移植不适合(TNE)患者的分层。缺失数据是一个常见问题,会影响预后模型的开发和验证,如何处理缺失数据的决策会对方法的选择产生影响。

方法

模型构建 训练和测试数据集来自两项大型随机多中心III期临床试验的TNE路径。通过专家意见确定潜在的预后因素。训练数据集中的缺失数据使用链式方程多重填补法进行填补。单变量分析在每个填补数据集中拟合Cox比例风险模型,并根据鲁宾法则合并估计值。多变量分析应用惩罚Cox回归模型,在各个填补数据集中使用固定的惩罚项。通过鲁宾法则合并每个填补数据集的估计值和自抽样标准误,以定义预后模型。模型评估 通过可视化各个填补数据集中观察到的和预测的概率来评估校准。通过鲁宾法则合并每个填补数据集的预后分离D统计量来评估区分度。模型验证 D统计量应用于训练数据集的自抽样内部验证过程和测试数据集的外部验证过程,其中预先指定了可接受的性能。风险组的制定 使用组合预后指数的三分位数定义风险组,该指数通过鲁宾法则合并每个填补数据集的预后指数获得。

结果

训练数据集包括1852例患者,其中1268例(68.47%)有完整病例数据。生成了10个填补数据集。测试数据集包括520例患者。训练数据集中预后模型的D统计量为0.840(95%CI 0.716 - 0.964),测试数据集中为0.654(95%CI 0.497 - 0.811),校正后的D统计量为0.801。

结论

在训练数据集中填补缺失协变量数据的决策影响了用于训练和测试模型的方法。为了扩展现有文献并帮助未来的研究人员,我们提供了一种方法的详细示例。虽然我们的示例并非没有局限性,但一个好处是训练数据集中所有可用的患者信息都被用于开发模型。

试验注册

两项试验均已注册;骨髓瘤IX - ISRCTN68454111,于2000年9月21日注册。骨髓瘤XI - ISRCTN49407852,于2009年6月24日注册。

相似文献

1
The development and validation of prognostic models for overall survival in the presence of missing data in the training dataset: a strategy with a detailed example.训练数据集中存在缺失数据时总生存预后模型的开发与验证:一个详细示例的策略
Diagn Progn Res. 2021 Aug 4;5(1):14. doi: 10.1186/s41512-021-00103-9.
2
A clinical prediction model for outcome and therapy delivery in transplant-ineligible patients with myeloma (UK Myeloma Research Alliance Risk Profile): a development and validation study.不适于移植的骨髓瘤患者结局及治疗方案的临床预测模型(英国骨髓瘤研究联盟风险评估):一项开发与验证研究
Lancet Haematol. 2019 Mar;6(3):e154-e166. doi: 10.1016/S2352-3026(18)30220-5. Epub 2019 Feb 6.
3
Prognostic models for identifying risk of poor outcome in people with acute ankle sprains: the SPRAINED development and external validation study.用于识别急性踝关节扭伤患者不良结局风险的预测模型:SPRAINED 研究的开发和外部验证。
Health Technol Assess. 2018 Nov;22(64):1-112. doi: 10.3310/hta22640.
4
Propensity score analysis with partially observed covariates: How should multiple imputation be used?倾向评分分析与部分观测协变量:应如何使用多重插补?
Stat Methods Med Res. 2019 Jan;28(1):3-19. doi: 10.1177/0962280217713032. Epub 2017 Jun 2.
5
Combining estimates of interest in prognostic modelling studies after multiple imputation: current practice and guidelines.结合多次插补后预后建模研究兴趣的估计:当前实践和指南。
BMC Med Res Methodol. 2009 Jul 28;9:57. doi: 10.1186/1471-2288-9-57.
6
Evaluating the median -value method for assessing the statistical significance of tests when using multiple imputation.评估在使用多重填补时用于评估检验统计显著性的中位数法。
J Appl Stat. 2024 Oct 25;52(6):1161-1176. doi: 10.1080/02664763.2024.2418473. eCollection 2025.
7
Prognostic models for newly-diagnosed chronic lymphocytic leukaemia in adults: a systematic review and meta-analysis.成人新诊断慢性淋巴细胞白血病的预后模型:一项系统评价和荟萃分析。
Cochrane Database Syst Rev. 2020 Jul 31;7(7):CD012022. doi: 10.1002/14651858.CD012022.pub2.
8
Combining multiple imputation and meta-analysis with individual participant data.结合多重插补和个体参与者数据的荟萃分析。
Stat Med. 2013 Nov 20;32(26):4499-514. doi: 10.1002/sim.5844. Epub 2013 May 24.
9
The performance of prognostic models depended on the choice of missing value imputation algorithm: a simulation study.预后模型的性能取决于缺失值插补算法的选择:一项模拟研究。
J Clin Epidemiol. 2024 Dec;176:111539. doi: 10.1016/j.jclinepi.2024.111539. Epub 2024 Sep 24.
10
Does the Presence of Missing Data Affect the Performance of the SORG Machine-learning Algorithm for Patients With Spinal Metastasis? Development of an Internet Application Algorithm.缺失数据的存在是否会影响 SORG 机器学习算法在脊柱转移瘤患者中的性能?开发一种互联网应用算法。
Clin Orthop Relat Res. 2024 Jan 1;482(1):143-157. doi: 10.1097/CORR.0000000000002706. Epub 2023 Jun 12.

引用本文的文献

1
Development and internal validation of a model to predict long-term survival of ANCA associated vasculitis.一种预测抗中性粒细胞胞浆抗体相关性血管炎长期生存的模型的开发与内部验证
Rheumatol Immunol Res. 2023 Apr 18;4(1):30-39. doi: 10.2478/rir-2023-0005. eCollection 2023 Mar.

本文引用的文献

1
Optimising the value of immunomodulatory drugs during induction and maintenance in transplant ineligible patients with newly diagnosed multiple myeloma: results from Myeloma XI, a multicentre, open-label, randomised, Phase III trial.在不适合移植的新诊断多发性骨髓瘤患者的诱导和维持治疗中优化免疫调节药物的价值:来自多中心、开放标签、随机、III 期试验 Myeloma XI 的结果。
Br J Haematol. 2021 Mar;192(5):853-868. doi: 10.1111/bjh.16945. Epub 2020 Jul 12.
2
Methodological standards for the development and evaluation of clinical prediction rules: a review of the literature.临床预测规则制定与评估的方法学标准:文献综述
Diagn Progn Res. 2019 Aug 22;3:16. doi: 10.1186/s41512-019-0060-y. eCollection 2019.
3
When and how to use data from randomised trials to develop or validate prognostic models.
何时以及如何使用随机试验数据来开发或验证预后模型。
BMJ. 2019 May 29;365:l2154. doi: 10.1136/bmj.l2154.
4
Accounting for missing data in statistical analyses: multiple imputation is not always the answer.在统计分析中处理缺失数据:多重插补并不总是答案。
Int J Epidemiol. 2019 Aug 1;48(4):1294-1304. doi: 10.1093/ije/dyz032.
5
A clinical prediction model for outcome and therapy delivery in transplant-ineligible patients with myeloma (UK Myeloma Research Alliance Risk Profile): a development and validation study.不适于移植的骨髓瘤患者结局及治疗方案的临床预测模型(英国骨髓瘤研究联盟风险评估):一项开发与验证研究
Lancet Haematol. 2019 Mar;6(3):e154-e166. doi: 10.1016/S2352-3026(18)30220-5. Epub 2019 Feb 6.
6
Lenalidomide maintenance versus observation for patients with newly diagnosed multiple myeloma (Myeloma XI): a multicentre, open-label, randomised, phase 3 trial.来那度胺维持治疗与观察用于初诊多发性骨髓瘤患者(Myeloma XI):一项多中心、开放标签、随机、3 期临床试验。
Lancet Oncol. 2019 Jan;20(1):57-73. doi: 10.1016/S1470-2045(18)30687-9. Epub 2018 Dec 14.
7
Assessment of predictive performance in incomplete data by combining internal validation and multiple imputation.通过结合内部验证和多重填补来评估不完整数据中的预测性能。
BMC Med Res Methodol. 2016 Oct 26;16(1):144. doi: 10.1186/s12874-016-0239-7.
8
Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent.通过坐标下降法求解Cox比例风险模型的正则化路径
J Stat Softw. 2011 Mar;39(5):1-13. doi: 10.18637/jss.v039.i05.
9
Impact analysis studies of clinical prediction rules relevant to primary care: a systematic review.与初级保健相关的临床预测规则的影响分析研究:一项系统综述
BMJ Open. 2016 Mar 15;6(3):e009957. doi: 10.1136/bmjopen-2015-009957.
10
Multivariable model development and internal validation for prostate cancer specific survival and overall survival after whole-gland salvage Iodine-125 prostate brachytherapy.全腺体挽救性碘-125 前列腺近距离放射治疗后前列腺癌特异性生存和总生存的多变量模型建立和内部验证。
Radiother Oncol. 2016 Apr;119(1):104-10. doi: 10.1016/j.radonc.2016.02.002. Epub 2016 Feb 17.