• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用柯尔莫哥洛夫-斯米尔诺夫检验诊断插补模型中的问题:一项模拟研究。

Diagnosing problems with imputation models using the Kolmogorov-Smirnov test: a simulation study.

机构信息

Clinical Epidemiology & Biostatistics Unit, Murdoch Childrens Research Institute, The Royal Children's Hospital, Flemington Road Parkville, Melbourne, Victoria 3052, Australia.

出版信息

BMC Med Res Methodol. 2013 Nov 20;13:144. doi: 10.1186/1471-2288-13-144.

DOI:10.1186/1471-2288-13-144
PMID:24252653
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3840572/
Abstract

BACKGROUND

Multiple imputation (MI) is becoming increasingly popular as a strategy for handling missing data, but there is a scarcity of tools for checking the adequacy of imputation models. The Kolmogorov-Smirnov (KS) test has been identified as a potential diagnostic method for assessing whether the distribution of imputed data deviates substantially from that of the observed data. The aim of this study was to evaluate the performance of the KS test as an imputation diagnostic.

METHODS

Using simulation, we examined whether the KS test could reliably identify departures from assumptions made in the imputation model. To do this we examined how the p-values from the KS test behaved when skewed and heavy-tailed data were imputed using a normal imputation model. We varied the amount of missing data, the missing data models and the amount of skewness, and evaluated the performance of KS test in diagnosing issues with the imputation models under these different scenarios.

RESULTS

The KS test was able to flag differences between the observations and imputed values; however, these differences did not always correspond to problems with MI inference for the regression parameter of interest. When there was a strong missing at random dependency, the KS p-values were very small, regardless of whether or not the MI estimates were biased; so that the KS test was not able to discriminate between imputed variables that required further investigation, and those that did not. The p-values were also sensitive to sample size and the proportion of missing data, adding to the challenge of interpreting the results from the KS test.

CONCLUSIONS

Given our study results, it is difficult to establish guidelines or recommendations for using the KS test as a diagnostic tool for MI. The investigation of other imputation diagnostics and their incorporation into statistical software are important areas for future research.

摘要

背景

多重插补(MI)作为处理缺失数据的策略越来越受欢迎,但用于检查插补模型充分性的工具却很少。柯尔莫哥洛夫-斯米尔诺夫(KS)检验已被确定为评估插补数据的分布是否与观测数据有显著差异的潜在诊断方法。本研究旨在评估 KS 检验作为插补诊断的性能。

方法

我们通过模拟研究了 KS 检验是否可以可靠地识别插补模型中假设的偏差。为此,我们研究了当使用正态插补模型插补偏态和重尾数据时,KS 检验的 p 值如何变化。我们改变了缺失数据的数量、缺失数据模型以及偏度的大小,并在这些不同的场景下评估 KS 检验在诊断插补模型问题方面的性能。

结果

KS 检验能够标记观察值和插补值之间的差异;然而,这些差异并不总是对应于感兴趣的回归参数的 MI 推断问题。当存在强烈的随机缺失依赖性时,无论 MI 估计值是否存在偏差,KS p 值都非常小;因此,KS 检验无法区分需要进一步调查的插补变量和不需要进一步调查的插补变量。p 值还对样本量和缺失数据的比例敏感,这增加了解释 KS 检验结果的难度。

结论

根据我们的研究结果,很难确定将 KS 检验作为 MI 诊断工具使用的准则或建议。调查其他插补诊断方法及其纳入统计软件是未来研究的重要领域。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ade3/3840572/e1f83179273e/1471-2288-13-144-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ade3/3840572/1e701310adc5/1471-2288-13-144-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ade3/3840572/e1f83179273e/1471-2288-13-144-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ade3/3840572/1e701310adc5/1471-2288-13-144-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ade3/3840572/e1f83179273e/1471-2288-13-144-2.jpg

相似文献

1
Diagnosing problems with imputation models using the Kolmogorov-Smirnov test: a simulation study.使用柯尔莫哥洛夫-斯米尔诺夫检验诊断插补模型中的问题:一项模拟研究。
BMC Med Res Methodol. 2013 Nov 20;13:144. doi: 10.1186/1471-2288-13-144.
2
A comparison of multiple imputation methods for missing data in longitudinal studies.纵向研究中缺失数据的多种插补方法比较。
BMC Med Res Methodol. 2018 Dec 12;18(1):168. doi: 10.1186/s12874-018-0615-6.
3
Comparison of methods for imputing limited-range variables: a simulation study.有限范围变量插补方法的比较:一项模拟研究。
BMC Med Res Methodol. 2014 Apr 26;14:57. doi: 10.1186/1471-2288-14-57.
4
Multiple imputation for handling missing outcome data when estimating the relative risk.采用多重插补处理估计相对危险度时丢失的结局数据。
BMC Med Res Methodol. 2017 Sep 6;17(1):134. doi: 10.1186/s12874-017-0414-5.
5
A comparison of multiple imputation methods for handling missing values in longitudinal data in the presence of a time-varying covariate with a non-linear association with time: a simulation study.存在与时间呈非线性关联的时变协变量时,用于处理纵向数据中缺失值的多种多重填补方法的比较:一项模拟研究。
BMC Med Res Methodol. 2017 Jul 25;17(1):114. doi: 10.1186/s12874-017-0372-y.
6
Multiple imputation with sequential penalized regression.多重插补与序贯惩罚回归。
Stat Methods Med Res. 2019 May;28(5):1311-1327. doi: 10.1177/0962280218755574. Epub 2018 Feb 16.
7
A bias-corrected estimator in multiple imputation for missing data.一种用于缺失数据多重插补的偏差校正估计器。
Stat Med. 2018 Oct 15;37(23):3373-3386. doi: 10.1002/sim.7833. Epub 2018 May 29.
8
Assessment of predictive performance in incomplete data by combining internal validation and multiple imputation.通过结合内部验证和多重填补来评估不完整数据中的预测性能。
BMC Med Res Methodol. 2016 Oct 26;16(1):144. doi: 10.1186/s12874-016-0239-7.
9
Multiple imputation methods for handling missing values in a longitudinal categorical variable with restrictions on transitions over time: a simulation study.多种插补方法处理具有时间过渡限制的纵向分类变量中的缺失值:一项模拟研究。
BMC Med Res Methodol. 2019 Jan 10;19(1):14. doi: 10.1186/s12874-018-0653-0.
10
Does pattern mixture modelling reduce bias due to informative attrition compared to fitting a mixed effects model to the available cases or data imputed using multiple imputation?: a simulation study.与对可用病例或使用多重插补法插补的数据拟合混合效应模型相比,模式混合建模是否会减少因信息性缺失而产生的偏差?一项模拟研究。
BMC Med Res Methodol. 2018 Aug 29;18(1):89. doi: 10.1186/s12874-018-0548-0.

引用本文的文献

1
Student dropout prediction through machine learning optimization: insights from moodle log data.通过机器学习优化进行学生辍学预测:来自Moodle日志数据的见解
Sci Rep. 2025 Mar 21;15(1):9840. doi: 10.1038/s41598-025-93918-1.
2
Multiple Data Imputation Methods Advance Risk Analysis and Treatability of Co-occurring Inorganic Chemicals in Groundwater.多种数据插补方法提高了地下水共存无机化学品风险分析和可处理性。
Environ Sci Technol. 2024 Nov 19;58(46):20513-20524. doi: 10.1021/acs.est.4c05203. Epub 2024 Nov 7.
3
Gap-free 16-year (2005-2020) sub-diurnal surface meteorological observations across Florida.

本文引用的文献

1
Diagnosing imputation models by applying target analyses to posterior replicates of completed data.通过对已完成数据的后验重复进行目标分析来诊断插补模型。
Stat Med. 2012 Jan 13;31(1):1-18. doi: 10.1002/sim.4413. Epub 2011 Dec 4.
2
Risk factors for childhood mental health symptoms: national longitudinal study of Australian children.儿童心理健康症状的风险因素:澳大利亚儿童全国纵向研究。
Pediatrics. 2011 Oct;128(4):e865-79. doi: 10.1542/peds.2011-0491. Epub 2011 Sep 2.
3
Multiple imputation using chained equations: Issues and guidance for practice.
无间隙的 16 年(2005-2020 年)佛罗里达州次昼夜地面气象观测。
Sci Data. 2023 Dec 16;10(1):907. doi: 10.1038/s41597-023-02826-4.
4
The 24-Form Tai Chi Improves Anxiety and Depression and Upregulates miR-17-92 in Coronary Heart Disease Patients After Percutaneous Coronary Intervention.24式太极拳可改善冠心病患者经皮冠状动脉介入治疗后的焦虑和抑郁,并上调miR-17-92水平。
Front Physiol. 2020 Mar 11;11:149. doi: 10.3389/fphys.2020.00149. eCollection 2020.
5
Accounting for missing data in statistical analyses: multiple imputation is not always the answer.在统计分析中处理缺失数据:多重插补并不总是答案。
Int J Epidemiol. 2019 Aug 1;48(4):1294-1304. doi: 10.1093/ije/dyz032.
6
R Package imputeTestbench to Compare Imputation Methods for Univariate Time Series.用于比较单变量时间序列插补方法的R包imputeTestbench
R J. 2018;10(1):218-233.
7
Model checking in multiple imputation: an overview and case study.多重填补中的模型检验:综述与案例研究
Emerg Themes Epidemiol. 2017 Aug 23;14:8. doi: 10.1186/s12982-017-0062-6. eCollection 2017.
8
The rise of multiple imputation: a review of the reporting and implementation of the method in medical research.多重填补法的兴起:医学研究中该方法报告与实施情况的综述
BMC Med Res Methodol. 2015 Apr 7;15:30. doi: 10.1186/s12874-015-0022-1.
9
Discovery of CTCF-sensitive Cis-spliced fusion RNAs between adjacent genes in human prostate cells.人类前列腺细胞中相邻基因间CTCF敏感的顺式剪接融合RNA的发现。
PLoS Genet. 2015 Feb 6;11(2):e1005001. doi: 10.1371/journal.pgen.1005001. eCollection 2015 Feb.
使用链式方程进行多重插补:实践中的问题和指导。
Stat Med. 2011 Feb 20;30(4):377-99. doi: 10.1002/sim.4067. Epub 2010 Nov 30.
4
The use and reporting of multiple imputation in medical research - a review.多变量插补在医学研究中的应用与报告——综述。
J Intern Med. 2010 Dec;268(6):586-93. doi: 10.1111/j.1365-2796.2010.02274.x. Epub 2010 Sep 10.
5
Multiple imputation for missing data: fully conditional specification versus multivariate normal imputation.缺失数据的多重插补:完全条件指定与多元正态插补。
Am J Epidemiol. 2010 Mar 1;171(5):624-32. doi: 10.1093/aje/kwp425. Epub 2010 Jan 27.
6
Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls.流行病学和临床研究中缺失数据的多重填补:潜力与陷阱
BMJ. 2009 Jun 29;338:b2393. doi: 10.1136/bmj.b2393.
7
Multiple imputation with large data sets: a case study of the Children's Mental Health Initiative.大数据集的多重填补:儿童心理健康倡议的案例研究
Am J Epidemiol. 2009 May 1;169(9):1133-9. doi: 10.1093/aje/kwp026. Epub 2009 Mar 24.
8
Multiple imputation of discrete and continuous data by fully conditional specification.通过完全条件设定对离散和连续数据进行多重填补
Stat Methods Med Res. 2007 Jun;16(3):219-42. doi: 10.1177/0962280206074463.
9
Much ado about nothing: A comparison of missing data methods and software to fit incomplete data regression models.无事生非:缺失数据方法与拟合不完全数据回归模型软件的比较
Am Stat. 2007 Feb;61(1):79-90. doi: 10.1198/000313007X172556.
10
Using the outcome for imputation of missing predictor values was preferred.使用结果来插补缺失的预测变量值是更可取的。
J Clin Epidemiol. 2006 Oct;59(10):1092-101. doi: 10.1016/j.jclinepi.2006.01.009. Epub 2006 Jun 19.