Suppr超能文献

运用队列设计开发预后模型时处理失访患者的实证分析。

An empirical analysis of dealing with patients who are lost to follow-up when developing prognostic models using a cohort design.

机构信息

Janssen Research and Development, Titusville, NJ, USA.

Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands.

出版信息

BMC Med Inform Decis Mak. 2021 Feb 6;21(1):43. doi: 10.1186/s12911-021-01408-x.

Abstract

BACKGROUND

Researchers developing prediction models are faced with numerous design choices that may impact model performance. One key decision is how to include patients who are lost to follow-up. In this paper we perform a large-scale empirical evaluation investigating the impact of this decision. In addition, we aim to provide guidelines for how to deal with loss to follow-up.

METHODS

We generate a partially synthetic dataset with complete follow-up and simulate loss to follow-up based either on random selection or on selection based on comorbidity. In addition to our synthetic data study we investigate 21 real-world data prediction problems. We compare four simple strategies for developing models when using a cohort design that encounters loss to follow-up. Three strategies employ a binary classifier with data that: (1) include all patients (including those lost to follow-up), (2) exclude all patients lost to follow-up or (3) only exclude patients lost to follow-up who do not have the outcome before being lost to follow-up. The fourth strategy uses a survival model with data that include all patients. We empirically evaluate the discrimination and calibration performance.

RESULTS

The partially synthetic data study results show that excluding patients who are lost to follow-up can introduce bias when loss to follow-up is common and does not occur at random. However, when loss to follow-up was completely at random, the choice of addressing it had negligible impact on model discrimination performance. Our empirical real-world data results showed that the four design choices investigated to deal with loss to follow-up resulted in comparable performance when the time-at-risk was 1-year but demonstrated differential bias when we looked into 3-year time-at-risk. Removing patients who are lost to follow-up before experiencing the outcome but keeping patients who are lost to follow-up after the outcome can bias a model and should be avoided.

CONCLUSION

Based on this study we therefore recommend (1) developing models using data that includes patients that are lost to follow-up and (2) evaluate the discrimination and calibration of models twice: on a test set including patients lost to follow-up and a test set excluding patients lost to follow-up.

摘要

背景

研究人员在开发预测模型时面临着许多可能影响模型性能的设计选择。其中一个关键决策是如何纳入随访丢失的患者。本文通过大规模实证评估研究了这一决策的影响。此外,我们旨在为如何处理随访丢失提供指导。

方法

我们生成了一个具有完整随访信息的部分合成数据集,并基于随机选择或基于合并症的选择来模拟随访丢失。除了我们的合成数据研究外,我们还调查了 21 个真实世界数据预测问题。我们比较了当使用遇到随访丢失的队列设计时,开发模型的四种简单策略。前三种策略使用带有数据的二进制分类器,该数据:(1)包括所有患者(包括随访丢失的患者),(2)排除所有随访丢失的患者,或(3)仅排除在随访丢失前没有结局的随访丢失患者。第四种策略使用包含所有患者的数据的生存模型。我们对判别和校准性能进行了实证评估。

结果

部分合成数据研究结果表明,当随访丢失很常见且不是随机发生时,排除随访丢失的患者可能会引入偏差。然而,当随访丢失完全随机时,处理它的选择对模型判别性能几乎没有影响。我们的真实世界数据实证结果表明,当风险时间为 1 年时,调查处理随访丢失的四种设计选择导致的性能相当,但当我们研究 3 年风险时间时,显示出了差异偏差。在出现结局之前移除随访丢失的患者,但保留在结局之后随访丢失的患者,可能会使模型产生偏差,因此应避免这种做法。

结论

基于本研究,我们建议(1)使用包含随访丢失患者的数据开发模型,以及(2)对模型的判别和校准进行两次评估:包括随访丢失患者的测试集和不包括随访丢失患者的测试集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/159f/7866757/8e96bcc96513/12911_2021_1408_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验