通过插补方法分析粗化和缺失数据。

Analyzing Coarsened and Missing Data by Imputation Methods.

作者信息

van der Burg Lars L J, Böhringer Stefan, Bartlett Jonathan W, Bosse Tjalling, Horeweg Nanda, de Wreede Liesbeth C, Putter Hein

机构信息

Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands.

London School of Hygiene and Tropical Medicine, London, UK.

出版信息

Stat Med. 2025 Mar 15;44(6):e70032. doi: 10.1002/sim.70032.

DOI:10.1002/sim.70032

PMID:40042406

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11881681/

Abstract

In various missing data problems, values are not entirely missing, but are coarsened. For coarsened observations, instead of observing the true value, a subset of values - strictly smaller than the full sample space of the variable - is observed to which the true value belongs. In our motivating example for patients with endometrial carcinoma, the degree of lymphovascular space invasion (LVSI) can be either absent, focally present, or substantially present. For a subset of individuals, however, LVSI is reported as being present, which includes both non-absent options. In the analysis of such a dataset, difficulties arise when coarsened observations are to be used in an imputation procedure. To our knowledge, no clear-cut method has been described in the literature on how to handle an observed subset of values, and treating them as entirely missing could lead to biased estimates. Therefore, in this paper, we evaluated the best strategy to deal with coarsened and missing data in multiple imputation. We tested a number of plausible ad hoc approaches, possibly already in use by statisticians. Additionally, we propose a principled approach to this problem, consisting of an adaptation of the SMC-FCS algorithm (SMC-FCS : Coarsening compatible), that ensures that imputed values adhere to the coarsening information. These methods were compared in a simulation study. This comparison shows that methods that prevent imputations of incompatible values, like the SMC-FCS method, perform consistently better in terms of a lower bias and RMSE, and achieve better coverage than methods that ignore coarsening or handle it in a more naïve way. The analysis of the motivating example shows that the way the coarsening information is handled can matter substantially, leading to different conclusions across methods. Overall, our proposed SMC-FCS method outperforms other methods in handling coarsened data, requires limited additional computation cost and is easily extendable to other scenarios.

摘要

在各种缺失数据问题中，数据并非完全缺失，而是被粗略化了。对于粗略化的观测值，观测到的不是真实值，而是一个值的子集——严格小于变量的完整样本空间——真实值属于该子集。在我们关于子宫内膜癌患者的激励示例中，淋巴血管间隙浸润（LVSI）程度可以是无、局灶性存在或大量存在。然而，对于一部分个体，LVSI被报告为存在，这包括了两种非无的情况。在分析这样一个数据集时，当在插补程序中使用粗略化的观测值时就会出现困难。据我们所知，文献中尚未描述如何处理观测到的值子集的明确方法，将它们视为完全缺失可能会导致有偏差的估计。因此，在本文中，我们评估了在多重插补中处理粗略化和缺失数据的最佳策略。我们测试了一些可能已经被统计学家使用的合理的临时方法。此外，我们提出了一种针对此问题的原则性方法，包括对SMC - FCS算法（SMC - FCS ：与粗略化兼容）的改编，该算法可确保插补值符合粗略化信息。这些方法在模拟研究中进行了比较。这种比较表明，像SMC - FCS 方法这样防止插补不兼容值的方法，在偏差和均方根误差（RMSE）方面表现始终更好，并且比忽略粗略化或以更简单方式处理它的方法具有更好的覆盖率。对激励示例的分析表明，处理粗略化信息的方式可能至关重要，会导致不同方法得出不同结论。总体而言，我们提出的SMC - FCS 方法在处理粗略化数据方面优于其他方法，所需的额外计算成本有限，并且易于扩展到其他场景。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf4d/11881681/c24ace9d3cb9/SIM-44-0-g003.jpg

相似文献

Analyzing Coarsened and Missing Data by Imputation Methods.通过插补方法分析粗化和缺失数据。

Stat Med. 2025 Mar 15;44(6):e70032. doi: 10.1002/sim.70032.

A comparison of multiple imputation methods for handling missing values in longitudinal data in the presence of a time-varying covariate with a non-linear association with time: a simulation study.存在与时间呈非线性关联的时变协变量时，用于处理纵向数据中缺失值的多种多重填补方法的比较：一项模拟研究。

BMC Med Res Methodol. 2017 Jul 25;17(1):114. doi: 10.1186/s12874-017-0372-y.

The performance of prognostic models depended on the choice of missing value imputation algorithm: a simulation study.预后模型的性能取决于缺失值插补算法的选择：一项模拟研究。

J Clin Epidemiol. 2024 Dec;176:111539. doi: 10.1016/j.jclinepi.2024.111539. Epub 2024 Sep 24.

Dealing with missing information on covariates for excess mortality hazard regression models - Making the imputation model compatible with the substantive model.处理超额死亡率风险回归模型中协变量缺失信息 - 使插补模型与实质模型兼容。

Stat Methods Med Res. 2021 Oct;30(10):2256-2268. doi: 10.1177/09622802211031615. Epub 2021 Sep 2.

Two-stage multiple imputation with a longitudinal composite variable.使用纵向复合变量的两阶段多重填补法。

BMC Med Res Methodol. 2025 May 6;25(1):124. doi: 10.1186/s12874-025-02555-9.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Accounting for bias due to outcome data missing not at random: comparison and illustration of two approaches to probabilistic bias analysis: a simulation study.考虑由于非随机缺失结局数据导致的偏倚：两种概率性偏倚分析方法的比较和说明：一项模拟研究。

BMC Med Res Methodol. 2024 Nov 13;24(1):278. doi: 10.1186/s12874-024-02382-4.

Comparison of different approaches in handling missing data in longitudinal multiple-item patient-reported outcomes: a simulation study.纵向多项患者报告结局中处理缺失数据的不同方法比较：一项模拟研究。

Health Qual Life Outcomes. 2025 Apr 5;23(1):34. doi: 10.1186/s12955-025-02364-0.

Multiple imputation in the presence of an incomplete binary variable created from an underlying continuous variable.在存在由潜在连续变量创建的不完整二元变量的情况下进行多重填补。

Biom J. 2020 Mar;62(2):467-478. doi: 10.1002/bimj.201900011. Epub 2019 Jul 15.

Multiple imputation using auxiliary imputation variables that only predict missingness can increase bias due to data missing not at random.仅使用辅助预测缺失变量的多重插补可能会因数据缺失而增加偏差。

BMC Med Res Methodol. 2024 Oct 7;24(1):231. doi: 10.1186/s12874-024-02353-9.

本文引用的文献

Making Sense of Censored Covariates: Statistical Methods for Studies of Huntington's Disease.理解删失协变量：亨廷顿舞蹈症研究的统计方法

Annu Rev Stat Appl. 2024 Apr;11:255-277. doi: 10.1146/annurev-statistics-040522-095944. Epub 2023 Sep 8.

Haplotype reconstruction for genetically complex regions with ambiguous genotype calls: Illustration by the KIR gene region.单倍型重建用于基因型模糊的遗传复杂区域：以 KIR 基因区域为例。

Genet Epidemiol. 2024 Feb;48(1):3-26. doi: 10.1002/gepi.22538. Epub 2023 Oct 13.

Prognostic refinement of NSMP high-risk endometrial cancers using oestrogen receptor immunohistochemistry.采用雌激素受体免疫组化对 NSMP 高危型子宫内膜癌进行预后细化。

Br J Cancer. 2023 Mar;128(7):1360-1368. doi: 10.1038/s41416-023-02141-0. Epub 2023 Jan 23.

Multiple imputation for cause-specific Cox models: Assessing methods for estimation and prediction.多变量填补在特定原因 Cox 模型中的应用：评估用于估计和预测的方法。

Stat Methods Med Res. 2022 Oct;31(10):1860-1880. doi: 10.1177/09622802221102623. Epub 2022 Jun 5.

ESGO/ESTRO/ESP guidelines for the management of patients with endometrial carcinoma.ESGO/ESTRO/ESP 子宫内膜癌管理指南。

Int J Gynecol Cancer. 2021 Jan;31(1):12-39. doi: 10.1136/ijgc-2020-002230. Epub 2020 Dec 18.

Adjuvant chemoradiotherapy versus radiotherapy alone in women with high-risk endometrial cancer (PORTEC-3): patterns of recurrence and post-hoc survival analysis of a randomised phase 3 trial.高危型子宫内膜癌患者辅助放化疗与单纯放疗比较（PORTEC-3）：一项随机 3 期临床试验的复发模式和事后生存分析。

Lancet Oncol. 2019 Sep;20(9):1273-1285. doi: 10.1016/S1470-2045(19)30395-X. Epub 2019 Jul 22.

Using simulation studies to evaluate statistical methods.运用模拟研究评估统计方法。

Stat Med. 2019 May 20;38(11):2074-2102. doi: 10.1002/sim.8086. Epub 2019 Jan 16.

Ten-year results of the PORTEC-2 trial for high-intermediate risk endometrial carcinoma: improving patient selection for adjuvant therapy.PORTEC-2 试验治疗中高危子宫内膜癌 10 年结果：辅助治疗中患者选择的改善。

Br J Cancer. 2018 Oct;119(9):1067-1074. doi: 10.1038/s41416-018-0310-8. Epub 2018 Oct 25.

Substantial lymph-vascular space invasion (LVSI) is a significant risk factor for recurrence in endometrial cancer--A pooled analysis of PORTEC 1 and 2 trials.大量的淋巴管血管间隙浸润（LVSI）是子宫内膜癌复发的一个重要危险因素——PORTEC1 和 2 试验的汇总分析。

Eur J Cancer. 2015 Sep;51(13):1742-50. doi: 10.1016/j.ejca.2015.05.015. Epub 2015 Jun 3.

Avoiding bias due to perfect prediction in multiple imputation of incomplete categorical variables.在不完全分类变量的多重填补中避免因完美预测导致的偏差。

Comput Stat Data Anal. 2010 Oct 1;54(10):2267-2275. doi: 10.1016/j.csda.2010.04.005.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

通过插补方法分析粗化和缺失数据。

Analyzing Coarsened and Missing Data by Imputation Methods.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献