用于缺失数据插补的Transformer深度学习模型：ReMasker模型在心理测量量表上的应用

Transformers deep learning models for missing data imputation: an application of the ReMasker model on a psychometric scale.

作者信息

Casella Monica, Milano Nicola, Dolce Pasquale, Marocco Davide

机构信息

Natural and Artificial Cognition Laboratory, Department of Humanistic Studies, University of Naples "Federico II", Naples, Italy.

Department of Translational Medical Science, University of Naples "Federico II", Naples, Italy.

出版信息

Front Psychol. 2024 Dec 17;15:1449272. doi: 10.3389/fpsyg.2024.1449272. eCollection 2024.

DOI:10.3389/fpsyg.2024.1449272

PMID:39744035

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11688576/

Abstract

INTRODUCTION

Missing data in psychometric research presents a substantial challenge, impacting the reliability and validity of study outcomes. Various factors contribute to this issue, including participant non-response, dropout, or technical errors during data collection. Traditional methods like mean imputation or regression, commonly used to handle missing data, rely upon assumptions that may not hold on psychological data and can lead to distorted results.

METHODS

This study aims to evaluate the effectiveness of transformer-based deep learning for missing data imputation, comparing ReMasker, a masking autoencoding transformer model, with conventional imputation techniques (mean and median imputation, Expectation-Maximization algorithm) and machine learning approaches (K-nearest neighbors, MissForest, and an Artificial Neural Network). A psychometric dataset from the COVID distress repository was used, with imputation performance assessed through the Root Mean Squared Error (RMSE) between the original and imputed data matrices.

RESULTS

Results indicate that machine learning techniques, particularly ReMasker, achieve superior performance in terms of reconstruction error compared to conventional imputation techniques across all tested scenarios.

DISCUSSION

This finding underscores the potential of transformer-based models to provide robust imputation in psychometric research, enhancing data integrity and generalizability.

摘要

引言

心理测量学研究中的缺失数据带来了重大挑战，影响研究结果的可靠性和有效性。导致这个问题的因素有很多，包括参与者无回应、退出或数据收集过程中的技术错误。像均值插补或回归这样的传统方法，常用于处理缺失数据，它们依赖的假设可能不适用于心理数据，并且可能导致结果失真。

方法

本研究旨在评估基于Transformer的深度学习在缺失数据插补方面的有效性，将掩码自动编码Transformer模型ReMasker与传统插补技术（均值和中位数插补、期望最大化算法）以及机器学习方法（K近邻、MissForest和人工神经网络）进行比较。使用了来自COVID困扰库的心理测量数据集，通过原始数据矩阵和插补后数据矩阵之间的均方根误差（RMSE）来评估插补性能。

结果

结果表明，在所有测试场景中，与传统插补技术相比，机器学习技术，特别是ReMasker，在重构误差方面表现更优。

讨论

这一发现强调了基于Transformer的模型在心理测量学研究中提供强大插补的潜力，增强了数据完整性和可推广性。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

用于缺失数据插补的Transformer深度学习模型：ReMasker模型在心理测量量表上的应用

Transformers deep learning models for missing data imputation: an application of the ReMasker model on a psychometric scale.

作者信息

机构信息

出版信息

INTRODUCTION

METHODS

RESULTS

DISCUSSION

引言

方法

结果

讨论

相似文献

本文引用的文献

用于缺失数据插补的Transformer深度学习模型：ReMasker模型在心理测量量表上的应用

Transformers deep learning models for missing data imputation: an application of the ReMasker model on a psychometric scale.

作者信息

机构信息

出版信息

INTRODUCTION

METHODS

RESULTS

DISCUSSION

引言

方法

结果

讨论

相似文献

本文引用的文献