运用队列设计开发预后模型时处理失访患者的实证分析。

An empirical analysis of dealing with patients who are lost to follow-up when developing prognostic models using a cohort design.

机构信息

Janssen Research and Development, Titusville, NJ, USA.

Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands.

出版信息

BMC Med Inform Decis Mak. 2021 Feb 6;21(1):43. doi: 10.1186/s12911-021-01408-x.

DOI:10.1186/s12911-021-01408-x

PMID:33549087

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7866757/

Abstract

BACKGROUND

Researchers developing prediction models are faced with numerous design choices that may impact model performance. One key decision is how to include patients who are lost to follow-up. In this paper we perform a large-scale empirical evaluation investigating the impact of this decision. In addition, we aim to provide guidelines for how to deal with loss to follow-up.

METHODS

We generate a partially synthetic dataset with complete follow-up and simulate loss to follow-up based either on random selection or on selection based on comorbidity. In addition to our synthetic data study we investigate 21 real-world data prediction problems. We compare four simple strategies for developing models when using a cohort design that encounters loss to follow-up. Three strategies employ a binary classifier with data that: (1) include all patients (including those lost to follow-up), (2) exclude all patients lost to follow-up or (3) only exclude patients lost to follow-up who do not have the outcome before being lost to follow-up. The fourth strategy uses a survival model with data that include all patients. We empirically evaluate the discrimination and calibration performance.

RESULTS

The partially synthetic data study results show that excluding patients who are lost to follow-up can introduce bias when loss to follow-up is common and does not occur at random. However, when loss to follow-up was completely at random, the choice of addressing it had negligible impact on model discrimination performance. Our empirical real-world data results showed that the four design choices investigated to deal with loss to follow-up resulted in comparable performance when the time-at-risk was 1-year but demonstrated differential bias when we looked into 3-year time-at-risk. Removing patients who are lost to follow-up before experiencing the outcome but keeping patients who are lost to follow-up after the outcome can bias a model and should be avoided.

CONCLUSION

Based on this study we therefore recommend (1) developing models using data that includes patients that are lost to follow-up and (2) evaluate the discrimination and calibration of models twice: on a test set including patients lost to follow-up and a test set excluding patients lost to follow-up.

摘要

背景

研究人员在开发预测模型时面临着许多可能影响模型性能的设计选择。其中一个关键决策是如何纳入随访丢失的患者。本文通过大规模实证评估研究了这一决策的影响。此外，我们旨在为如何处理随访丢失提供指导。

方法

我们生成了一个具有完整随访信息的部分合成数据集，并基于随机选择或基于合并症的选择来模拟随访丢失。除了我们的合成数据研究外，我们还调查了 21 个真实世界数据预测问题。我们比较了当使用遇到随访丢失的队列设计时，开发模型的四种简单策略。前三种策略使用带有数据的二进制分类器，该数据：（1）包括所有患者（包括随访丢失的患者），（2）排除所有随访丢失的患者，或（3）仅排除在随访丢失前没有结局的随访丢失患者。第四种策略使用包含所有患者的数据的生存模型。我们对判别和校准性能进行了实证评估。

结果

部分合成数据研究结果表明，当随访丢失很常见且不是随机发生时，排除随访丢失的患者可能会引入偏差。然而，当随访丢失完全随机时，处理它的选择对模型判别性能几乎没有影响。我们的真实世界数据实证结果表明，当风险时间为 1 年时，调查处理随访丢失的四种设计选择导致的性能相当，但当我们研究 3 年风险时间时，显示出了差异偏差。在出现结局之前移除随访丢失的患者，但保留在结局之后随访丢失的患者，可能会使模型产生偏差，因此应避免这种做法。

结论

基于本研究，我们建议（1）使用包含随访丢失患者的数据开发模型，以及（2）对模型的判别和校准进行两次评估：包括随访丢失患者的测试集和不包括随访丢失患者的测试集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/159f/7866757/8e96bcc96513/12911_2021_1408_Fig1_HTML.jpg

相似文献

An empirical analysis of dealing with patients who are lost to follow-up when developing prognostic models using a cohort design.运用队列设计开发预后模型时处理失访患者的实证分析。

BMC Med Inform Decis Mak. 2021 Feb 6;21(1):43. doi: 10.1186/s12911-021-01408-x.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

GRADE Guidelines: 29. Rating the certainty in time-to-event outcomes-Study limitations due to censoring of participants with missing data in intervention studies.GRADE 指南：29. 对事件发生时间结局的确定性进行评级——由于干预研究中对缺失数据的参与者进行删失而导致的研究局限性。

J Clin Epidemiol. 2021 Jan;129:126-137. doi: 10.1016/j.jclinepi.2020.09.017. Epub 2020 Sep 30.

Erratum: Eyestalk Ablation to Increase Ovarian Maturation in Mud Crabs.勘误：切除眼柄以增加泥蟹的卵巢成熟度。

J Vis Exp. 2023 May 26(195). doi: 10.3791/6561.

[Standard technical specifications for methacholine chloride (Methacholine) bronchial challenge test (2023)].[氯化乙酰甲胆碱支气管激发试验标准技术规范（2023年）]

Zhonghua Jie He He Hu Xi Za Zhi. 2024 Feb 12;47(2):101-119. doi: 10.3760/cma.j.cn112147-20231019-00247.

Optimal surveillance strategies for patients with stage 1 cutaneous melanoma post primary tumour excision: three systematic reviews and an economic model.原发性肿瘤切除术后Ⅰ期皮肤黑色素瘤患者的最佳监测策略：三项系统评价和一个经济模型

Health Technol Assess. 2021 Nov;25(64):1-178. doi: 10.3310/hta25640.

International Validation of the SORG Machine-learning Algorithm for Predicting the Survival of Patients with Extremity Metastases Undergoing Surgical Treatment.国际验证 SORG 机器学习算法在预测接受手术治疗的肢体转移患者生存情况的应用。

Clin Orthop Relat Res. 2022 Feb 1;480(2):367-378. doi: 10.1097/CORR.0000000000001969.

Does the SORG Machine-learning Algorithm for Extremity Metastases Generalize to a Contemporary Cohort of Patients? Temporal Validation From 2016 to 2020.SORG 机器学习算法对肢体转移瘤的泛化能力如何？2016 年至 2020 年的时间验证。

Clin Orthop Relat Res. 2023 Dec 1;481(12):2419-2430. doi: 10.1097/CORR.0000000000002698. Epub 2023 May 25.

Prognostic models for identifying risk of poor outcome in people with acute ankle sprains: the SPRAINED development and external validation study.用于识别急性踝关节扭伤患者不良结局风险的预测模型：SPRAINED 研究的开发和外部验证。

Health Technol Assess. 2018 Nov;22(64):1-112. doi: 10.3310/hta22640.

引用本文的文献

Can we develop real-world prognostic models using observational healthcare data? Large-scale experiment to investigate model sensitivity to database and phenotypes.我们能否利用观察性医疗保健数据开发真实世界的预后模型？调查模型对数据库和表型敏感性的大规模实验。

Diagn Progn Res. 2025 Apr 17;9(1):10. doi: 10.1186/s41512-025-00191-x.

Comparison of 1-year mortality predictions from vendor-supplied academic model for cancer patients.供应商提供的癌症患者学术模型对1年死亡率预测的比较。

PeerJ. 2025 Feb 11;13:e18958. doi: 10.7717/peerj.18958. eCollection 2025.

The contribution of common mental disorders and alcohol-related morbidity to educational differences in early labour market exit among older workers: a register-based cohort study.常见精神障碍和酒精相关疾病对老年工人早期劳动力市场退出教育差异的影响：一项基于登记的队列研究。

Eur J Public Health. 2025 Feb 1;35(1):65-71. doi: 10.1093/eurpub/ckae212.

Development and validation of a patient-level model to predict dementia across a network of observational databases.开发和验证一种基于患者水平的模型，以在一个观察性数据库网络中预测痴呆症。

BMC Med. 2024 Jul 29;22(1):308. doi: 10.1186/s12916-024-03530-9.

Towards global model generalizability: independent cross-site feature evaluation for patient-level risk prediction models using the OHDSI network.迈向全球模型通用性：使用 OHDSI 网络进行患者水平风险预测模型的独立跨站点特征评估。

J Am Med Inform Assoc. 2024 Apr 19;31(5):1051-1061. doi: 10.1093/jamia/ocae028.

Short-term outcomes of Mitomycin-C augmented phaco-trabeculectomy using subconjunctival injections versus soaked sponges: a randomized controlled trial.丝裂霉素 C 增强的超声乳化小梁切除术联合结膜下注射与浸药海绵的短期疗效比较：一项随机对照试验。

Eye (Lond). 2024 Apr;38(6):1196-1201. doi: 10.1038/s41433-023-02869-2. Epub 2023 Dec 6.

Dynamic Digital Twin: Diagnosis, Treatment, Prediction, and Prevention of Disease During the Life Course.动态数字孪生：生命全程疾病的诊断、治疗、预测和预防。

J Med Internet Res. 2022 Sep 14;24(9):e35675. doi: 10.2196/35675.

本文引用的文献

Development and validation of a prognostic model predicting symptomatic hemorrhagic transformation in acute ischemic stroke at scale in the OHDSI network.在 OHDSI 网络中大规模开发和验证用于预测急性缺血性脑卒中症状性出血转化的预后模型。

PLoS One. 2020 Jan 7;15(1):e0226718. doi: 10.1371/journal.pone.0226718. eCollection 2020.

Inferring disease severity in rheumatoid arthritis using predictive modeling in administrative claims databases.利用行政索赔数据库中的预测模型推断类风湿关节炎的疾病严重程度。

PLoS One. 2019 Dec 18;14(12):e0226255. doi: 10.1371/journal.pone.0226255. eCollection 2019.

Design and implementation of a standardized framework to generate and evaluate patient-level prediction models using observational healthcare data.利用观察性医疗保健数据生成和评估患者水平预测模型的标准化框架的设计与实现。

J Am Med Inform Assoc. 2018 Aug 1;25(8):969-975. doi: 10.1093/jamia/ocy032.

When and how should multiple imputation be used for handling missing data in randomised clinical trials - a practical guide with flowcharts.何时以及如何在随机临床试验中使用多重插补来处理缺失数据——附流程图的实用指南。

BMC Med Res Methodol. 2017 Dec 6;17(1):162. doi: 10.1186/s12874-017-0442-1.

Multiple imputation for handling missing outcome data when estimating the relative risk.采用多重插补处理估计相对危险度时丢失的结局数据。

BMC Med Res Methodol. 2017 Sep 6;17(1):134. doi: 10.1186/s12874-017-0414-5.

Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review.利用电子健康记录数据开发风险预测模型的机遇与挑战：一项系统综述

J Am Med Inform Assoc. 2017 Jan;24(1):198-208. doi: 10.1093/jamia/ocw042. Epub 2016 May 17.

How to Establish Clinical Prediction Models.如何建立临床预测模型。

Endocrinol Metab (Seoul). 2016 Mar;31(1):38-44. doi: 10.3803/EnM.2016.31.1.38.

Adapting machine learning techniques to censored time-to-event health record data: A general-purpose approach using inverse probability of censoring weighting.使机器学习技术适用于删失的事件发生时间健康记录数据：一种使用删失加权逆概率的通用方法。

J Biomed Inform. 2016 Jun;61:119-31. doi: 10.1016/j.jbi.2016.03.009. Epub 2016 Mar 16.

Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD Statement.个体预后或诊断多变量预测模型的透明报告（TRIPOD）：TRIPOD声明

BMC Med. 2015 Jan 6;13:1. doi: 10.1186/s12916-014-0241-z.

Massive parallelization of serial inference algorithms for a complex generalized linear model.用于复杂广义线性模型的串行推理算法的大规模并行化。

ACM Trans Model Comput Simul. 2013 Jan;23(1). doi: 10.1145/2414416.2414791.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

运用队列设计开发预后模型时处理失访患者的实证分析。

An empirical analysis of dealing with patients who are lost to follow-up when developing prognostic models using a cohort design.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献