• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
Variable Selection in the Presence of Missing Data: Imputation-based Methods.存在缺失数据时的变量选择:基于插补的方法。
Wiley Interdiscip Rev Comput Stat. 2017 Sep-Oct;9(5). doi: 10.1002/wics.1402. Epub 2017 May 24.
2
Heckman imputation models for binary or continuous MNAR outcomes and MAR predictors.Heckman 插补模型用于二分类或连续 MNAR 结局和 MAR 预测因子。
BMC Med Res Methodol. 2018 Aug 31;18(1):90. doi: 10.1186/s12874-018-0547-1.
3
missForest with feature selection using binary particle swarm optimization improves the imputation accuracy of continuous data.使用二进制粒子群优化进行特征选择的 missForest 提高了连续数据的插补准确性。
Genes Genomics. 2022 Jun;44(6):651-658. doi: 10.1007/s13258-022-01247-8. Epub 2022 Apr 6.
4
Variable selection in the presence of missing data: resampling and imputation.存在缺失数据时的变量选择:重采样与插补
Biostatistics. 2015 Jul;16(3):596-610. doi: 10.1093/biostatistics/kxv003. Epub 2015 Feb 18.
5
Missing data imputation using classification and regression trees.使用分类与回归树进行缺失数据插补
PeerJ Comput Sci. 2024 Jun 28;10:e2119. doi: 10.7717/peerj-cs.2119. eCollection 2024.
6
A real data-driven simulation strategy to select an imputation method for mixed-type trait data.一种基于真实数据驱动的选择混合类型性状数据插补方法的模拟策略。
PLoS Comput Biol. 2023 Mar 22;19(3):e1010154. doi: 10.1371/journal.pcbi.1010154. eCollection 2023 Mar.
7
Missing data and multiple imputation in clinical epidemiological research.临床流行病学研究中的缺失数据与多重填补
Clin Epidemiol. 2017 Mar 15;9:157-166. doi: 10.2147/CLEP.S129785. eCollection 2017.
8
Random forest-based imputation outperforms other methods for imputing LC-MS metabolomics data: a comparative study.基于随机森林的插补方法在 LC-MS 代谢组学数据插补方面优于其他方法:一项比较研究。
BMC Bioinformatics. 2019 Oct 11;20(1):492. doi: 10.1186/s12859-019-3110-0.
9
[How to deal with missing data? Multiple imputation by chained equations: recommendations and explanations for clinical practice].[如何处理缺失数据?链式方程多重填补:临床实践的建议与解释]
Nephrol Ther. 2023 Jun 19;19(3):171-179. doi: 10.1684/ndt.2023.24.
10
Advanced methods for missing values imputation based on similarity learning.基于相似性学习的缺失值插补先进方法。
PeerJ Comput Sci. 2021 Jul 21;7:e619. doi: 10.7717/peerj-cs.619. eCollection 2021.

引用本文的文献

1
Identifying most important predictors for suicidal thoughts and behaviours among healthcare workers active during the Spain COVID-19 pandemic: a machine-learning approach.识别西班牙新冠疫情期间在职医护人员自杀念头和行为的最重要预测因素:一种机器学习方法。
Epidemiol Psychiatr Sci. 2025 May 8;34:e28. doi: 10.1017/S2045796025000198.
2
Messenger Use and Video Calls as Correlates of Depressive and Anxiety Symptoms: Results From the Corona Health App Study of German Adults During the COVID-19 Pandemic.作为抑郁和焦虑症状相关因素的即时通讯工具使用及视频通话:COVID-19大流行期间德国成年人冠状病毒健康应用程序研究的结果
J Med Internet Res. 2024 Sep 16;26:e45530. doi: 10.2196/45530.
3
Consolidated Reporting Guidelines for Prognostic and Diagnostic Machine Learning Modeling Studies: Development and Validation.用于预后和诊断机器学习建模研究的综合报告指南:制定和验证。
J Med Internet Res. 2023 Aug 31;25:e48763. doi: 10.2196/48763.
4
Nonmedical Prescription Drug Use Among Female Adolescents: The Relative Influence of Maternal Factors, Social Norms, and Perceptions of Risk and Availability.女性青少年非医疗性使用处方药情况:母亲因素、社会规范以及对风险和可得性认知的相对影响
Drugs (Abingdon Engl). 2023;30(3):334-343. doi: 10.1080/09687637.2022.2028727. Epub 2022 Jan 25.
5
Successful Community Discharge Among Older Adults With Traumatic Brain Injury Admitted to Inpatient Rehabilitation Facilities.入住住院康复机构的老年创伤性脑损伤患者成功社区出院情况。
Arch Rehabil Res Clin Transl. 2022 Nov 1;4(4):100241. doi: 10.1016/j.arrct.2022.100241. eCollection 2022 Dec.
6
Arbovirus risk perception as a predictor of mosquito-bite preventive behaviors in Ponce, Puerto Rico.波多黎各庞塞的虫媒病毒风险认知与蚊虫叮咬预防行为的关系。
PLoS Negl Trop Dis. 2022 Jul 26;16(7):e0010653. doi: 10.1371/journal.pntd.0010653. eCollection 2022 Jul.
7
Characteristics of community-dwelling older individuals who delayed care during the COVID-19 pandemic.在 COVID-19 大流行期间延迟护理的社区居住的老年个体的特征。
Arch Gerontol Geriatr. 2022 Jul-Aug;101:104710. doi: 10.1016/j.archger.2022.104710. Epub 2022 Apr 27.
8
How to apply variable selection machine learning algorithms with multiply imputed data: A missing discussion.如何在多重插补数据中应用变量选择机器学习算法:一个缺失的讨论。
Psychol Methods. 2023 Apr;28(2):452-471. doi: 10.1037/met0000478. Epub 2022 Feb 3.
9
The Impact of Functional Dependence and Related Surgical Complications on Postoperative Mortality.功能依赖及相关手术并发症对术后死亡率的影响。
J Med Syst. 2021 Nov 25;46(1):6. doi: 10.1007/s10916-021-01779-8.
10
Children's mental and behavioral health, schooling, and socioeconomic characteristics during school closure in France due to COVID-19: the SAPRIS project.由于 COVID-19 导致法国学校关闭,儿童的心理健康和行为、学校教育和社会经济特征:SAPRIS 项目。
Sci Rep. 2021 Nov 17;11(1):22373. doi: 10.1038/s41598-021-01676-7.

本文引用的文献

1
VARIABLE SELECTION AND PREDICTION WITH INCOMPLETE HIGH-DIMENSIONAL DATA.具有不完整高维数据的变量选择与预测
Ann Appl Stat. 2016 Mar;10(1):418-450. doi: 10.1214/15-AOAS899. Epub 2016 Mar 25.
2
Multiple Imputation for General Missing Data Patterns in the Presence of High-dimensional Data.高维数据存在时一般缺失数据模式的多重填补
Sci Rep. 2016 Feb 12;6:21689. doi: 10.1038/srep21689.
3
Variable selection models based on multiple imputation with an application for predicting median effective dose and maximum effect.基于多重填补的变量选择模型及其在预测半数有效剂量和最大效应中的应用。
J Stat Comput Simul. 2015;85(9):1902-1916. doi: 10.1080/00949655.2014.907801.
4
Variable selection in the presence of missing data: resampling and imputation.存在缺失数据时的变量选择:重采样与插补
Biostatistics. 2015 Jul;16(3):596-610. doi: 10.1093/biostatistics/kxv003. Epub 2015 Feb 18.
5
Validation of prediction models based on lasso regression with multiply imputed data.基于套索回归与多重填补数据的预测模型验证
BMC Med Res Methodol. 2014 Oct 16;14:116. doi: 10.1186/1471-2288-14-116.
6
Multiple imputation in the presence of high-dimensional data.高维数据情形下的多重填补
Stat Methods Med Res. 2016 Oct;25(5):2021-2035. doi: 10.1177/0962280213511027. Epub 2013 Nov 25.
7
Model selection of generalized estimating equations with multiply imputed longitudinal data.具有多重填补纵向数据的广义估计方程的模型选择
Biom J. 2013 Nov;55(6):899-911. doi: 10.1002/bimj.201200236. Epub 2013 Aug 23.
8
Variable selection for multiply-imputed data with application to dioxin exposure study.具有应用于二恶英暴露研究的多重插补数据的变量选择。
Stat Med. 2013 Sep 20;32(21):3646-59. doi: 10.1002/sim.5783. Epub 2013 Mar 25.
9
RANDOM LASSO.随机套索算法
Ann Appl Stat. 2011 Mar 1;5(1):468-485. doi: 10.1214/10-AOAS377.
10
Model selection for generalized estimating equations accommodating dropout missingness.适用于处理脱落缺失的广义估计方程的模型选择
Biometrics. 2012 Dec;68(4):1046-54. doi: 10.1111/j.1541-0420.2012.01758.x. Epub 2012 Mar 29.

存在缺失数据时的变量选择:基于插补的方法。

Variable Selection in the Presence of Missing Data: Imputation-based Methods.

作者信息

Zhao Yize, Long Qi

机构信息

Department of Healthcare Policy and Research, Weill Cornell Medical College, Cornell University.

Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania.

出版信息

Wiley Interdiscip Rev Comput Stat. 2017 Sep-Oct;9(5). doi: 10.1002/wics.1402. Epub 2017 May 24.

DOI:10.1002/wics.1402
PMID:29085552
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5659333/
Abstract

Variable selection plays an essential role in regression analysis as it identifies important variables that associated with outcomes and is known to improve predictive accuracy of resulting models. Variable selection methods have been widely investigated for fully observed data. However, in the presence of missing data, methods for variable selection need to be carefully designed to account for missing data mechanisms and statistical techniques used for handling missing data. Since imputation is arguably the most popular method for handling missing data due to its ease of use, statistical methods for variable selection that are combined with imputation are of particular interest. These methods, valid used under the assumptions of missing at random (MAR) and missing completely at random (MCAR), largely fall into three general strategies. The first strategy applies existing variable selection methods to each imputed dataset and then combine variable selection results across all imputed datasets. The second strategy applies existing variable selection methods to stacked imputed datasets. The third variable selection strategy combines resampling techniques such as bootstrap with imputation. Despite recent advances, this area remains under-developed and offers fertile ground for further research.

摘要

变量选择在回归分析中起着至关重要的作用,因为它能识别与结果相关的重要变量,并且已知可以提高所得模型的预测准确性。变量选择方法已针对完全观测数据进行了广泛研究。然而,在存在缺失数据的情况下,需要精心设计变量选择方法,以考虑缺失数据机制以及用于处理缺失数据的统计技术。由于插补因其易用性可说是处理缺失数据最常用的方法,因此与插补相结合的变量选择统计方法特别受关注。这些方法在随机缺失(MAR)和完全随机缺失(MCAR)的假设下有效使用,主要分为三种一般策略。第一种策略是将现有的变量选择方法应用于每个插补数据集,然后合并所有插补数据集的变量选择结果。第二种策略是将现有的变量选择方法应用于堆叠的插补数据集。第三种变量选择策略是将重采样技术(如自助法)与插补相结合。尽管最近取得了进展,但该领域仍未充分发展,为进一步研究提供了丰富的空间。