如何在多重插补数据中应用变量选择机器学习算法：一个缺失的讨论。

How to apply variable selection machine learning algorithms with multiply imputed data: A missing discussion.

机构信息

Department of Quantitative Health Sciences, Mayo Clinic.

Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles.

出版信息

Psychol Methods. 2023 Apr;28(2):452-471. doi: 10.1037/met0000478. Epub 2022 Feb 3.

DOI:10.1037/met0000478

PMID:35113633

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10117422/

Abstract

Psychological researchers often use standard linear regression to identify relevant predictors of an outcome of interest, but challenges emerge with incomplete data and growing numbers of candidate predictors. Regularization methods like the LASSO can reduce the risk of overfitting, increase model interpretability, and improve prediction in future samples; however, handling missing data when using regularization-based variable selection methods is complicated. Using listwise deletion or an ad hoc imputation strategy to deal with missing data when using regularization methods can lead to loss of precision, substantial bias, and a reduction in predictive ability. In this tutorial, we describe three approaches for fitting a LASSO when using multiple imputation to handle missing data and illustrate how to implement these approaches in practice with an applied example. We discuss implications of each approach and describe additional research that would help solidify recommendations for best practices. (PsycInfo Database Record (c) 2023 APA, all rights reserved).

摘要

心理学研究人员通常使用标准线性回归来识别感兴趣的结果的相关预测因子，但在数据不完整和候选预测因子数量增加的情况下，会出现挑战。正则化方法（如 LASSO）可以降低过拟合的风险，提高模型的可解释性，并提高未来样本的预测能力；然而，在使用基于正则化的变量选择方法处理缺失数据时，情况会变得复杂。在使用正则化方法处理缺失数据时，使用全量删除或特定插补策略可能会导致精度损失、大量偏差以及预测能力降低。在本教程中，我们描述了在使用多重插补处理缺失数据时拟合 LASSO 的三种方法，并通过一个应用示例说明如何在实践中实现这些方法。我们讨论了每种方法的含义，并描述了有助于为最佳实践提供建议的其他研究。（PsycInfo 数据库记录（c）2023 APA，保留所有权利）。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

如何在多重插补数据中应用变量选择机器学习算法：一个缺失的讨论。

How to apply variable selection machine learning algorithms with multiply imputed data: A missing discussion.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

如何在多重插补数据中应用变量选择机器学习算法：一个缺失的讨论。

How to apply variable selection machine learning algorithms with multiply imputed data: A missing discussion.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献