• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

错误分类和异常值对插补方法的影响。

The impact of misclassifications and outliers on imputation methods.

作者信息

Templ M, Ulmer Markus

机构信息

Institute for Competitiveness and Communication, School of Business, University of Applied Sciences and Art Northwestern Switzerland, Olten, Switzerland.

Institute of Data Analysis and Process Design, School of Engineering, Zurich University of Applied Sciences, Winterthur, Switzerland.

出版信息

J Appl Stat. 2024 Mar 5;51(14):2894-2928. doi: 10.1080/02664763.2024.2325969. eCollection 2024.

DOI:10.1080/02664763.2024.2325969
PMID:39450101
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11500630/
Abstract

Many imputation methods have been developed over the years and tested mostly under ideal settings. Surprisingly, there is no detailed research on how imputation methods perform when the idealized assumptions about the distribution of data and/or model assumptions are partly not fulfilled. This research looks into the susceptibility of imputation techniques, particularly in relation to outliers, misclassifications, and incorrect model specifications. This is crucial knowledge about how well the methods convince in everyday life because, in reality, conditions are usually not ideal, and model assumptions may not hold. The data may not fit the defined models well. Outliers distort the estimates, and misclassifications reduce the quality of most imputation methods. Several different evaluation measures are discussed, from comparing imputed values with true values or comparing certain statistics, from the performance of classifiers to the variance of estimated parameters. Some well-known imputation methods are compared based on real data and simulations. It turns out that robust conditional imputation methods outperform other methods for real data and simulation settings.

摘要

多年来已经开发了许多插补方法,并且大多是在理想条件下进行测试的。令人惊讶的是,对于当关于数据分布和/或模型假设的理想化假设部分未得到满足时插补方法的表现如何,尚无详细研究。本研究探讨了插补技术的敏感性,特别是与异常值、错误分类和不正确的模型设定相关的敏感性。这是关于这些方法在实际应用中效果如何的关键知识,因为在现实中,条件通常并不理想,模型假设可能不成立。数据可能与定义的模型不太拟合。异常值会扭曲估计值,错误分类会降低大多数插补方法的质量。讨论了几种不同的评估方法,从将插补值与真实值进行比较或比较某些统计量,到分类器的性能再到估计参数的方差。基于实际数据和模拟对一些著名的插补方法进行了比较。结果表明,在实际数据和模拟设置中,稳健的条件插补方法优于其他方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/11500630/ae75d9c6d7ad/CJAS_A_2325969_F0012_OB.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/11500630/727402d0253c/CJAS_A_2325969_F0001_OC.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/11500630/0dfbfe743154/CJAS_A_2325969_F0002_OC.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/11500630/23f1c291da3a/CJAS_A_2325969_F0003_OB.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/11500630/cd17b31c0454/CJAS_A_2325969_F0004_OB.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/11500630/7ea726066d31/CJAS_A_2325969_F0005_OB.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/11500630/074cea920278/CJAS_A_2325969_F0006_OB.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/11500630/491d86784fbc/CJAS_A_2325969_F0007_OC.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/11500630/e44ad445b11b/CJAS_A_2325969_F0008_OC.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/11500630/0271e7d11a14/CJAS_A_2325969_F0009_OC.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/11500630/3ca1df788491/CJAS_A_2325969_F0010_OC.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/11500630/bdc584c5f139/CJAS_A_2325969_F0011_OB.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/11500630/ae75d9c6d7ad/CJAS_A_2325969_F0012_OB.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/11500630/727402d0253c/CJAS_A_2325969_F0001_OC.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/11500630/0dfbfe743154/CJAS_A_2325969_F0002_OC.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/11500630/23f1c291da3a/CJAS_A_2325969_F0003_OB.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/11500630/cd17b31c0454/CJAS_A_2325969_F0004_OB.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/11500630/7ea726066d31/CJAS_A_2325969_F0005_OB.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/11500630/074cea920278/CJAS_A_2325969_F0006_OB.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/11500630/491d86784fbc/CJAS_A_2325969_F0007_OC.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/11500630/e44ad445b11b/CJAS_A_2325969_F0008_OC.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/11500630/0271e7d11a14/CJAS_A_2325969_F0009_OC.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/11500630/3ca1df788491/CJAS_A_2325969_F0010_OC.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/11500630/bdc584c5f139/CJAS_A_2325969_F0011_OB.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a71a/11500630/ae75d9c6d7ad/CJAS_A_2325969_F0012_OB.jpg

相似文献

1
The impact of misclassifications and outliers on imputation methods.错误分类和异常值对插补方法的影响。
J Appl Stat. 2024 Mar 5;51(14):2894-2928. doi: 10.1080/02664763.2024.2325969. eCollection 2024.
2
The performance of prognostic models depended on the choice of missing value imputation algorithm: a simulation study.预后模型的性能取决于缺失值插补算法的选择:一项模拟研究。
J Clin Epidemiol. 2024 Dec;176:111539. doi: 10.1016/j.jclinepi.2024.111539. Epub 2024 Sep 24.
3
rMisbeta: A robust missing value imputation approach in transcriptomics and metabolomics data.rMisbeta:转录组学和代谢组学数据中稳健的缺失值插补方法。
Comput Biol Med. 2021 Nov;138:104911. doi: 10.1016/j.compbiomed.2021.104911. Epub 2021 Sep 29.
4
Robust data imputation.强大的数据插补
Comput Biol Chem. 2009 Feb;33(1):7-13. doi: 10.1016/j.compbiolchem.2008.07.019. Epub 2008 Jul 18.
5
On mining incomplete medical datasets: Ordering imputation and classification.关于挖掘不完整医学数据集:排序插补与分类。
Technol Health Care. 2015;23(5):619-25. doi: 10.3233/THC-151018.
6
The impact of imputation quality on machine learning classifiers for datasets with missing values.插补质量对具有缺失值数据集的机器学习分类器的影响。
Commun Med (Lond). 2023 Oct 6;3(1):139. doi: 10.1038/s43856-023-00356-z.
7
Multiple imputation for handling missing outcome data when estimating the relative risk.采用多重插补处理估计相对危险度时丢失的结局数据。
BMC Med Res Methodol. 2017 Sep 6;17(1):134. doi: 10.1186/s12874-017-0414-5.
8
Diagnosing problems with imputation models using the Kolmogorov-Smirnov test: a simulation study.使用柯尔莫哥洛夫-斯米尔诺夫检验诊断插补模型中的问题:一项模拟研究。
BMC Med Res Methodol. 2013 Nov 20;13:144. doi: 10.1186/1471-2288-13-144.
9
Multiple imputation with sequential penalized regression.多重插补与序贯惩罚回归。
Stat Methods Med Res. 2019 May;28(5):1311-1327. doi: 10.1177/0962280218755574. Epub 2018 Feb 16.
10
Missing value imputation for microarray data: a comprehensive comparison study and a web tool.微阵列数据的缺失值插补:一项综合比较研究及网络工具
BMC Syst Biol. 2013;7 Suppl 6(Suppl 6):S12. doi: 10.1186/1752-0509-7-S6-S12. Epub 2013 Dec 13.

本文引用的文献

1
A real data-driven simulation strategy to select an imputation method for mixed-type trait data.一种基于真实数据驱动的选择混合类型性状数据插补方法的模拟策略。
PLoS Comput Biol. 2023 Mar 22;19(3):e1010154. doi: 10.1371/journal.pcbi.1010154. eCollection 2023 Mar.
2
Evaluation of robust outlier detection methods for zero-inflated complex data.零膨胀复杂数据的稳健异常值检测方法评估
J Appl Stat. 2019 Sep 27;47(7):1144-1167. doi: 10.1080/02664763.2019.1671961. eCollection 2020.
3
A Benchmark for Data Imputation Methods.数据插补方法的一个基准。
Front Big Data. 2021 Jul 8;4:693674. doi: 10.3389/fdata.2021.693674. eCollection 2021.
4
Credit Card Fraud Detection: A Realistic Modeling and a Novel Learning Strategy.信用卡欺诈检测:一种现实的建模与一种新颖的学习策略。
IEEE Trans Neural Netw Learn Syst. 2018 Aug;29(8):3784-3797. doi: 10.1109/TNNLS.2017.2736643. Epub 2017 Sep 14.
5
Relative efficiency of joint-model and full-conditional-specification multiple imputation when conditional models are compatible: The general location model.在条件模型兼容时,联合模型和全条件指定多重插补的相对效率:广义位置模型。
Stat Methods Med Res. 2018 Jun;27(6):1603-1614. doi: 10.1177/0962280216665872. Epub 2016 Sep 5.
6
Multiple imputation using an iterative hot-deck with distance-based donor selection.使用基于距离的供体选择的迭代热插补法进行多重填补。
Stat Med. 2008 Jan 15;27(1):83-102. doi: 10.1002/sim.3001.
7
Review: a gentle introduction to imputation of missing values.综述:缺失值插补的简要介绍
J Clin Epidemiol. 2006 Oct;59(10):1087-91. doi: 10.1016/j.jclinepi.2006.01.014. Epub 2006 Jul 11.
8
A comparison of imputation techniques for handling missing data.处理缺失数据的插补技术比较。
West J Nurs Res. 2002 Nov;24(7):815-29. doi: 10.1177/019394502762477004.
9
Performance of a general location model with an ignorable missing-data assumption in a multivariate mental health services study.在一项多变量心理健康服务研究中,具有可忽略缺失数据假设的一般位置模型的性能。
Stat Med. 1999 Nov 30;18(22):3123-35. doi: 10.1002/(sici)1097-0258(19991130)18:22<3123::aid-sim277>3.0.co;2-2.