• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

缺失协变量情况下预测与插补准确性之间的关系

On the Relation between Prediction and Imputation Accuracy under Missing Covariates.

作者信息

Ramosaj Burim, Tulowietzki Justus, Pauly Markus

机构信息

Faculty of Statistics, TU Dortmund University, Joseph-Von-Fraunhofer Str. 2-4, 44227 Dortmund, Germany.

出版信息

Entropy (Basel). 2022 Mar 9;24(3):386. doi: 10.3390/e24030386.

DOI:10.3390/e24030386
PMID:35327897
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8947649/
Abstract

Missing covariates in regression or classification problems can prohibit the direct use of advanced tools for further analysis. Recent research has realized an increasing trend towards the use of modern Machine-Learning algorithms for imputation. This originates from their capability of showing favorable prediction accuracy in different learning problems. In this work, we analyze through simulation the interaction between imputation accuracy and prediction accuracy in regression learning problems with missing covariates when Machine-Learning-based methods for both imputation and prediction are used. We see that even a slight decrease in imputation accuracy can seriously affect the prediction accuracy. In addition, we explore imputation performance when using statistical inference procedures in prediction settings, such as the coverage rates of (valid) prediction intervals. Our analysis is based on empirical datasets provided by the UCI Machine Learning repository and an extensive simulation study.

摘要

回归或分类问题中协变量缺失会妨碍直接使用先进工具进行进一步分析。最近的研究显示,使用现代机器学习算法进行插补的趋势日益增加。这源于它们在不同学习问题中展现出良好预测准确性的能力。在这项工作中,当使用基于机器学习的插补和预测方法时,我们通过模拟分析了回归学习问题中协变量缺失情况下插补准确性与预测准确性之间的相互作用。我们发现,即使插补准确性略有下降也会严重影响预测准确性。此外,我们探讨了在预测设置中使用统计推断程序时的插补性能,例如(有效)预测区间的覆盖率。我们的分析基于加州大学欧文分校机器学习库提供的经验数据集以及广泛的模拟研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d45/8947649/0e502274fd96/entropy-24-00386-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d45/8947649/9e1f9ec5e7c4/entropy-24-00386-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d45/8947649/9c8420929127/entropy-24-00386-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d45/8947649/c053673d4d29/entropy-24-00386-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d45/8947649/c1b1384b86d7/entropy-24-00386-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d45/8947649/3930f7bcaab9/entropy-24-00386-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d45/8947649/5dc715ccc55a/entropy-24-00386-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d45/8947649/0e502274fd96/entropy-24-00386-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d45/8947649/9e1f9ec5e7c4/entropy-24-00386-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d45/8947649/9c8420929127/entropy-24-00386-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d45/8947649/c053673d4d29/entropy-24-00386-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d45/8947649/c1b1384b86d7/entropy-24-00386-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d45/8947649/3930f7bcaab9/entropy-24-00386-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d45/8947649/5dc715ccc55a/entropy-24-00386-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d45/8947649/0e502274fd96/entropy-24-00386-g007.jpg

相似文献

1
On the Relation between Prediction and Imputation Accuracy under Missing Covariates.缺失协变量情况下预测与插补准确性之间的关系
Entropy (Basel). 2022 Mar 9;24(3):386. doi: 10.3390/e24030386.
2
Analyzing the Effect of Imputation on Classification Performance under MCAR and MAR Missing Mechanisms.分析在完全随机缺失(MCAR)和随机缺失(MAR)缺失机制下插补对分类性能的影响。
Entropy (Basel). 2023 Mar 17;25(3):521. doi: 10.3390/e25030521.
3
Application of machine learning missing data imputation techniques in clinical decision making: taking the discharge assessment of patients with spontaneous supratentorial intracerebral hemorrhage as an example.机器学习缺失数据插补技术在临床决策中的应用:以自发性幕上脑出血患者出院评估为例。
BMC Med Inform Decis Mak. 2022 Jan 13;22(1):13. doi: 10.1186/s12911-022-01752-6.
4
The Optimal Machine Learning-Based Missing Data Imputation for the Cox Proportional Hazard Model.基于最优机器学习的 Cox 比例风险模型缺失数据插补。
Front Public Health. 2021 Jul 5;9:680054. doi: 10.3389/fpubh.2021.680054. eCollection 2021.
5
Multi-metric comparison of machine learning imputation methods with application to breast cancer survival.基于机器学习的插补方法的多指标比较及其在乳腺癌生存分析中的应用。
BMC Med Res Methodol. 2024 Aug 30;24(1):191. doi: 10.1186/s12874-024-02305-3.
6
Assessing and comparison of different machine learning methods in parent-offspring trios for genotype imputation.评估和比较亲子三联体中不同机器学习方法用于基因型填充的情况。
J Theor Biol. 2016 Jun 21;399:148-58. doi: 10.1016/j.jtbi.2016.03.035. Epub 2016 Apr 2.
7
Multiple imputation with sequential penalized regression.多重插补与序贯惩罚回归。
Stat Methods Med Res. 2019 May;28(5):1311-1327. doi: 10.1177/0962280218755574. Epub 2018 Feb 16.
8
Missing data imputation on the 5-year survival prediction of breast cancer patients with unknown discrete values.对具有未知离散值的乳腺癌患者5年生存预测中的缺失数据进行插补。
Comput Biol Med. 2015 Apr;59:125-133. doi: 10.1016/j.compbiomed.2015.02.006. Epub 2015 Feb 16.
9
Multiple Imputation via Generative Adversarial Network for High-dimensional Blockwise Missing Value Problems.基于生成对抗网络的多重插补法解决高维分块缺失值问题
Proc Int Conf Mach Learn Appl. 2021 Dec;2021:791-798. doi: 10.1109/icmla52953.2021.00131.
10
Benchmarking missing-values approaches for predictive models on health databases.健康数据库中预测模型缺失值处理方法的基准测试
Gigascience. 2022 Apr 15;11. doi: 10.1093/gigascience/giac013.

引用本文的文献

1
Imputation and missing indicators for handling missing data in the development and deployment of clinical prediction models: A simulation study.处理临床预测模型开发和部署中缺失数据的插补和缺失指标:一项模拟研究。
Stat Methods Med Res. 2023 Aug;32(8):1461-1477. doi: 10.1177/09622802231165001. Epub 2023 Apr 27.
2
Analyzing the Effect of Imputation on Classification Performance under MCAR and MAR Missing Mechanisms.分析在完全随机缺失(MCAR)和随机缺失(MAR)缺失机制下插补对分类性能的影响。
Entropy (Basel). 2023 Mar 17;25(3):521. doi: 10.3390/e25030521.

本文引用的文献

1
Asymptotic-based bootstrap approach for matched pairs with missingness in a single arm.单臂缺失的匹配对的渐近bootstrap 方法。
Biom J. 2021 Oct;63(7):1389-1405. doi: 10.1002/bimj.202000051. Epub 2021 Jul 8.
2
A cautionary tale on using imputation methods for inference in matched-pairs design.配对设计中使用插补方法进行推断的警示故事。
Bioinformatics. 2020 May 1;36(10):3099-3106. doi: 10.1093/bioinformatics/btaa082.
3
Multiplication-combination tests for incomplete paired data.不完全配对数据的乘加组合检验。
Stat Med. 2019 Jul 30;38(17):3243-3255. doi: 10.1002/sim.8178. Epub 2019 May 17.
4
Random Forest Missing Data Algorithms.随机森林缺失数据算法
Stat Anal Data Min. 2017 Dec;10(6):363-377. doi: 10.1002/sam.11348. Epub 2017 Jun 13.
5
Sequential BART for imputation of missing covariates.用于插补缺失协变量的顺序BART
Biostatistics. 2016 Jul;17(3):589-602. doi: 10.1093/biostatistics/kxw009. Epub 2016 Mar 15.
6
Maximizing the Usefulness of Data Obtained with Planned Missing Value Patterns: An Application of Maximum Likelihood Procedures.最大化利用具有计划缺失值模式获得的数据:最大似然程序的应用。
Multivariate Behav Res. 1996 Apr 1;31(2):197-218. doi: 10.1207/s15327906mbr3102_3.
7
Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study.基于 MICE 使用随机森林和参数插补模型比较缺失数据插补:CALIBER 研究。
Am J Epidemiol. 2014 Mar 15;179(6):764-74. doi: 10.1093/aje/kwt312. Epub 2014 Jan 12.
8
MissForest--non-parametric missing value imputation for mixed-type data.MissForest--用于混合类型数据的非参数缺失值插补。
Bioinformatics. 2012 Jan 1;28(1):112-8. doi: 10.1093/bioinformatics/btr597. Epub 2011 Oct 28.
9
Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls.流行病学和临床研究中缺失数据的多重填补:潜力与陷阱
BMJ. 2009 Jun 29;338:b2393. doi: 10.1136/bmj.b2393.
10
Imputation and variable selection in linear regression models with missing covariates.具有缺失协变量的线性回归模型中的插补和变量选择
Biometrics. 2005 Jun;61(2):498-506. doi: 10.1111/j.1541-0420.2005.00317.x.