Centre for Health Informatics, Division of Informatics, Imaging and Data Science, Faculty of Biology, Medicine and Health, University of Manchester, Manchester Academic Health Science Centre, Manchester, UK.
Centre for Health Informatics, Division of Informatics, Imaging and Data Science, Faculty of Biology, Medicine and Health, University of Manchester, Manchester Academic Health Science Centre, Manchester, UK.
J Clin Epidemiol. 2021 Dec;140:149-158. doi: 10.1016/j.jclinepi.2021.09.008. Epub 2021 Sep 11.
No clear guidance exists on handling missing data at each stage of developing, validating and implementing a clinical prediction model (CPM). We aimed to review the approaches to handling missing data that underly the CPMs currently recommended for use in UK healthcare.
A descriptive cross-sectional meta-epidemiological study aiming to identify CPMs recommended by the National Institute for Health and Care Excellence (NICE), which summarized how missing data is handled across their pipelines.
A total of 23 CPMs were included through "sampling strategy." Six missing data strategies were identified: complete case analysis (CCA), multiple imputation, imputation of mean values, k-nearest neighbours imputation, using an additional category for missingness, considering missing values as risk-factor-absent. 52% of the development articles and 48% of the validation articles did not report how missing data were handled. CCA was the most common approach used for development (40%) and validation (44%). At implementation, 57% of the CPMs required complete data entry, whilst 43% allowed missing values. Three CPMs had consistent paths in their pipelines.
A broad variety of methods for handling missing data underly the CPMs currently recommended for use in UK healthcare. Missing data handling strategies were generally inconsistent. Better quality assurance of CPMs needs greater clarity and consistency in handling of missing data.
在开发、验证和实施临床预测模型(CPM)的各个阶段,对于如何处理缺失数据,目前尚无明确的指导意见。我们旨在回顾目前在英国医疗保健中推荐使用的 CPM 所采用的处理缺失数据的方法。
本研究为描述性的横断面荟萃流行病学研究,旨在确定英国国家卫生与保健优化研究所(NICE)推荐的 CPM,总结了它们的研究流程中是如何处理缺失数据的。
通过“抽样策略”共纳入了 23 个 CPM。确定了 6 种缺失数据策略:完全病例分析(CCA)、多重插补、均值插补、k 近邻插补、使用缺失的附加类别、将缺失值视为风险因素缺失。在开发文章中,有 52%未报告缺失数据的处理方法,在验证文章中,有 48%未报告缺失数据的处理方法。CCA 是开发(40%)和验证(44%)中最常用的方法。在实施阶段,57%的 CPM 需要完整的数据输入,而 43%的 CPM 允许缺失值。有 3 个 CPM 在其研究流程中具有一致的路径。
目前在英国医疗保健中推荐使用的 CPM 所采用的处理缺失数据的方法多种多样。缺失数据处理策略通常不一致。为了更好地保证 CPM 的质量,需要在处理缺失数据方面更加明确和一致。