• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

如何在多重插补数据中应用变量选择机器学习算法:一个缺失的讨论。

How to apply variable selection machine learning algorithms with multiply imputed data: A missing discussion.

机构信息

Department of Quantitative Health Sciences, Mayo Clinic.

Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles.

出版信息

Psychol Methods. 2023 Apr;28(2):452-471. doi: 10.1037/met0000478. Epub 2022 Feb 3.

DOI:10.1037/met0000478
PMID:35113633
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10117422/
Abstract

Psychological researchers often use standard linear regression to identify relevant predictors of an outcome of interest, but challenges emerge with incomplete data and growing numbers of candidate predictors. Regularization methods like the LASSO can reduce the risk of overfitting, increase model interpretability, and improve prediction in future samples; however, handling missing data when using regularization-based variable selection methods is complicated. Using listwise deletion or an ad hoc imputation strategy to deal with missing data when using regularization methods can lead to loss of precision, substantial bias, and a reduction in predictive ability. In this tutorial, we describe three approaches for fitting a LASSO when using multiple imputation to handle missing data and illustrate how to implement these approaches in practice with an applied example. We discuss implications of each approach and describe additional research that would help solidify recommendations for best practices. (PsycInfo Database Record (c) 2023 APA, all rights reserved).

摘要

心理学研究人员通常使用标准线性回归来识别感兴趣的结果的相关预测因子,但在数据不完整和候选预测因子数量增加的情况下,会出现挑战。正则化方法(如 LASSO)可以降低过拟合的风险,提高模型的可解释性,并提高未来样本的预测能力;然而,在使用基于正则化的变量选择方法处理缺失数据时,情况会变得复杂。在使用正则化方法处理缺失数据时,使用全量删除或特定插补策略可能会导致精度损失、大量偏差以及预测能力降低。在本教程中,我们描述了在使用多重插补处理缺失数据时拟合 LASSO 的三种方法,并通过一个应用示例说明如何在实践中实现这些方法。我们讨论了每种方法的含义,并描述了有助于为最佳实践提供建议的其他研究。(PsycInfo 数据库记录(c)2023 APA,保留所有权利)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6168/10117422/6722f0306e60/nihms-1881707-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6168/10117422/6e0b8be6074f/nihms-1881707-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6168/10117422/fcf1e53527ae/nihms-1881707-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6168/10117422/9428fcca8359/nihms-1881707-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6168/10117422/6722f0306e60/nihms-1881707-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6168/10117422/6e0b8be6074f/nihms-1881707-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6168/10117422/fcf1e53527ae/nihms-1881707-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6168/10117422/9428fcca8359/nihms-1881707-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6168/10117422/6722f0306e60/nihms-1881707-f0004.jpg

相似文献

1
How to apply variable selection machine learning algorithms with multiply imputed data: A missing discussion.如何在多重插补数据中应用变量选择机器学习算法:一个缺失的讨论。
Psychol Methods. 2023 Apr;28(2):452-471. doi: 10.1037/met0000478. Epub 2022 Feb 3.
2
Multiple imputation of missing data in multilevel designs: A comparison of different strategies.多水平设计中缺失数据的多重插补:不同策略的比较。
Psychol Methods. 2017 Mar;22(1):141-165. doi: 10.1037/met0000096. Epub 2016 Sep 8.
3
Variable selection for multiply-imputed data with application to dioxin exposure study.具有应用于二恶英暴露研究的多重插补数据的变量选择。
Stat Med. 2013 Sep 20;32(21):3646-59. doi: 10.1002/sim.5783. Epub 2013 Mar 25.
4
Imputation strategies when a continuous outcome is to be dichotomized for responder analysis: a simulation study.当连续结果需要二分类化进行应答者分析时的推断策略:一项模拟研究。
BMC Med Res Methodol. 2019 Jul 23;19(1):161. doi: 10.1186/s12874-019-0793-x.
5
Covariate Selection for Multilevel Models with Missing Data.具有缺失数据的多层模型的协变量选择
Stat (Int Stat Inst). 2017;6(1):31-46. doi: 10.1002/sta4.133. Epub 2017 Jan 8.
6
Principled Missing Data Treatments.有原则的缺失数据处理。
Prev Sci. 2018 Apr;19(3):284-294. doi: 10.1007/s11121-016-0644-5.
7
Imputation and missing indicators for handling missing data in the development and deployment of clinical prediction models: A simulation study.处理临床预测模型开发和部署中缺失数据的插补和缺失指标:一项模拟研究。
Stat Methods Med Res. 2023 Aug;32(8):1461-1477. doi: 10.1177/09622802231165001. Epub 2023 Apr 27.
8
Multiple imputation with sequential penalized regression.多重插补与序贯惩罚回归。
Stat Methods Med Res. 2019 May;28(5):1311-1327. doi: 10.1177/0962280218755574. Epub 2018 Feb 16.
9
Using multiple imputation for analysis of incomplete data in clinical research.使用多重填补法分析临床研究中的不完全数据。
Nurs Res. 2002 Sep-Oct;51(5):339-43. doi: 10.1097/00006199-200209000-00012.
10
Multiple imputation methods for handling missing values in a longitudinal categorical variable with restrictions on transitions over time: a simulation study.多种插补方法处理具有时间过渡限制的纵向分类变量中的缺失值:一项模拟研究。
BMC Med Res Methodol. 2019 Jan 10;19(1):14. doi: 10.1186/s12874-018-0653-0.

引用本文的文献

1
Constructing a binary prediction model with incomplete data: Variable selection to balance fairness and precision.构建具有不完整数据的二元预测模型:平衡公平性和精度的变量选择。
Psychol Methods. 2025 Aug 14. doi: 10.1037/met0000786.
2
Elucidating the role of peripheral monocyte nicotinic acetylcholine receptors and inflammation in cognitive outcomes in older adults.阐明外周单核细胞烟碱型乙酰胆碱受体及炎症在老年人认知结果中的作用。
Biogerontology. 2025 Mar 30;26(2):82. doi: 10.1007/s10522-025-10220-3.
3
Using Machine Learning to Identify Social Determinants of Health that Impact Discharge Disposition for Hospitalized Patients.

本文引用的文献

1
How Many Imputations Do You Need? A Two-stage Calculation Using a Quadratic Rule.你需要多少次插补?使用二次规则的两阶段计算。
Sociol Methods Res. 2020 Aug;49(3):699-718. doi: 10.1177/0049124117747303. Epub 2018 Jan 18.
2
Variable selection with multiply-imputed datasets: choosing between stacked and grouped methods.多重插补数据集的变量选择:在堆叠法和分组法之间进行选择。
J Comput Graph Stat. 2022;31(4):1063-1075. doi: 10.1080/10618600.2022.2035739. Epub 2022 Mar 28.
3
Development of an Abbreviated Adult Reading History Questionnaire (ARHQ-Brief) Using a Machine Learning Approach.
利用机器学习识别影响住院患者出院处置的健康社会决定因素。
J Am Med Dir Assoc. 2025 May;26(5):105524. doi: 10.1016/j.jamda.2025.105524. Epub 2025 Mar 20.
4
Problem of pain in the USA: evaluating the generalisability of high-impact chronic pain models over time using National Health Interview Survey (NHIS) data.美国的疼痛问题:利用国家健康访谈调查(NHIS)数据评估高影响慢性疼痛模型随时间的普遍性。
BMJ Public Health. 2025 Jan 27;3(1):e001628. doi: 10.1136/bmjph-2024-001628. eCollection 2025.
5
Predicting the Performance of Students Using Deep Ensemble Learning.使用深度集成学习预测学生的表现。
J Intell. 2024 Dec 3;12(12):124. doi: 10.3390/jintelligence12120124.
6
Predictive model for genital tract infections among men and women in Ghana: An application of LASSO penalized cross-validation regression model.加纳男性和女性生殖道感染的预测模型:套索惩罚交叉验证回归模型的应用。
Epidemiol Infect. 2024 Dec 6;152:e160. doi: 10.1017/S0950268824001444.
7
Predicting implementation of response to intervention in math using elastic net logistic regression.使用弹性网络逻辑回归预测数学干预反应的实施情况。
Front Psychol. 2024 Oct 2;15:1410396. doi: 10.3389/fpsyg.2024.1410396. eCollection 2024.
8
Electronic health records and stratified psychiatry: bridge to precision treatment?电子健康记录与分层精神病学:通向精准治疗的桥梁?
Neuropsychopharmacology. 2024 Jan;49(1):285-290. doi: 10.1038/s41386-023-01724-y. Epub 2023 Sep 4.
采用机器学习方法开发简化成人阅读史问卷(ARHQ-Brief)
J Learn Disabil. 2022 Sep-Oct;55(5):427-442. doi: 10.1177/00222194211047631. Epub 2021 Oct 9.
4
Accounting for not-at-random missingness through imputation stacking.通过插补堆叠来处理非随机缺失。
Stat Med. 2021 Nov 30;40(27):6118-6132. doi: 10.1002/sim.9174. Epub 2021 Aug 29.
5
Using Machine Learning to Predict Young People's Internet Health and Social Service Information Seeking.使用机器学习预测年轻人的互联网健康和社会服务信息寻求。
Prev Sci. 2021 Nov;22(8):1173-1184. doi: 10.1007/s11121-021-01255-2. Epub 2021 May 11.
6
Development and internal validation of a predictive risk model for anxiety after completion of treatment for early stage breast cancer.早期乳腺癌治疗完成后焦虑症预测风险模型的开发与内部验证
J Patient Rep Outcomes. 2020 Dec 4;4(1):103. doi: 10.1186/s41687-020-00267-w.
7
Predictors of Internet Health Information-Seeking Behaviors Among Young Adults Living With HIV Across the United States: Longitudinal Observational Study.美国成年 HIV 感染者互联网健康信息搜索行为的预测因素:纵向观察研究。
J Med Internet Res. 2020 Nov 2;22(11):e18309. doi: 10.2196/18309.
8
The psychology of professional and student actors: Creativity, personality, and motivation.专业演员和学生演员的心理学:创造力、个性和动机。
PLoS One. 2020 Oct 22;15(10):e0240728. doi: 10.1371/journal.pone.0240728. eCollection 2020.
9
A general method for elicitation, imputation, and sensitivity analysis for incomplete repeated binary data.一种用于不完全重复二项数据的 elicitation、imputation 和敏感性分析的通用方法。
Stat Med. 2020 Sep 30;39(22):2921-2935. doi: 10.1002/sim.8584. Epub 2020 Jul 17.
10
High Lifetime Prevalence of Syphilis in Men Who Have Sex With Men and Transgender Women Versus Low Lifetime Prevalence in Female Sex Workers in Lima, Peru.秘鲁利马男男性行为者和跨性别女性终身梅毒感染率高,性工作者终身梅毒感染率低。
Sex Transm Dis. 2020 Aug;47(8):549-555. doi: 10.1097/OLQ.0000000000001200.