Suppr超能文献

通过插补方法分析粗化和缺失数据。

Analyzing Coarsened and Missing Data by Imputation Methods.

作者信息

van der Burg Lars L J, Böhringer Stefan, Bartlett Jonathan W, Bosse Tjalling, Horeweg Nanda, de Wreede Liesbeth C, Putter Hein

机构信息

Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands.

London School of Hygiene and Tropical Medicine, London, UK.

出版信息

Stat Med. 2025 Mar 15;44(6):e70032. doi: 10.1002/sim.70032.

Abstract

In various missing data problems, values are not entirely missing, but are coarsened. For coarsened observations, instead of observing the true value, a subset of values - strictly smaller than the full sample space of the variable - is observed to which the true value belongs. In our motivating example for patients with endometrial carcinoma, the degree of lymphovascular space invasion (LVSI) can be either absent, focally present, or substantially present. For a subset of individuals, however, LVSI is reported as being present, which includes both non-absent options. In the analysis of such a dataset, difficulties arise when coarsened observations are to be used in an imputation procedure. To our knowledge, no clear-cut method has been described in the literature on how to handle an observed subset of values, and treating them as entirely missing could lead to biased estimates. Therefore, in this paper, we evaluated the best strategy to deal with coarsened and missing data in multiple imputation. We tested a number of plausible ad hoc approaches, possibly already in use by statisticians. Additionally, we propose a principled approach to this problem, consisting of an adaptation of the SMC-FCS algorithm (SMC-FCS : Coarsening compatible), that ensures that imputed values adhere to the coarsening information. These methods were compared in a simulation study. This comparison shows that methods that prevent imputations of incompatible values, like the SMC-FCS method, perform consistently better in terms of a lower bias and RMSE, and achieve better coverage than methods that ignore coarsening or handle it in a more naïve way. The analysis of the motivating example shows that the way the coarsening information is handled can matter substantially, leading to different conclusions across methods. Overall, our proposed SMC-FCS method outperforms other methods in handling coarsened data, requires limited additional computation cost and is easily extendable to other scenarios.

摘要

在各种缺失数据问题中,数据并非完全缺失,而是被粗略化了。对于粗略化的观测值,观测到的不是真实值,而是一个值的子集——严格小于变量的完整样本空间——真实值属于该子集。在我们关于子宫内膜癌患者的激励示例中,淋巴血管间隙浸润(LVSI)程度可以是无、局灶性存在或大量存在。然而,对于一部分个体,LVSI被报告为存在,这包括了两种非无的情况。在分析这样一个数据集时,当在插补程序中使用粗略化的观测值时就会出现困难。据我们所知,文献中尚未描述如何处理观测到的值子集的明确方法,将它们视为完全缺失可能会导致有偏差的估计。因此,在本文中,我们评估了在多重插补中处理粗略化和缺失数据的最佳策略。我们测试了一些可能已经被统计学家使用的合理的临时方法。此外,我们提出了一种针对此问题的原则性方法,包括对SMC - FCS算法(SMC - FCS :与粗略化兼容)的改编,该算法可确保插补值符合粗略化信息。这些方法在模拟研究中进行了比较。这种比较表明,像SMC - FCS 方法这样防止插补不兼容值的方法,在偏差和均方根误差(RMSE)方面表现始终更好,并且比忽略粗略化或以更简单方式处理它的方法具有更好的覆盖率。对激励示例的分析表明,处理粗略化信息的方式可能至关重要,会导致不同方法得出不同结论。总体而言,我们提出的SMC - FCS 方法在处理粗略化数据方面优于其他方法,所需的额外计算成本有限,并且易于扩展到其他场景。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf4d/11881681/c24ace9d3cb9/SIM-44-0-g003.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验