Suppr超能文献

运用现代机器学习技术处理存在缺失值的高维数据。

Handling high-dimensional data with missing values by modern machine learning techniques.

作者信息

Chen Sixia, Xu Chao

机构信息

Department of Biostatistics and Epidemiology, The University of Oklahoma Health Sciences Center, Oklahoma City, OK, USA.

出版信息

J Appl Stat. 2022 May 1;50(3):786-804. doi: 10.1080/02664763.2022.2068514. eCollection 2023.

Abstract

High-dimensional data have been regarded as one of the most important types of big data in practice. It happens frequently in practice including genetic study, financial study, and geographical study. Missing data in high dimensional data analysis should be handled properly to reduce nonresponse bias. We discuss some modern machine learning techniques including penalized regression approaches, tree-based approaches, and deep learning (DL) for handling missing data with high dimensionality. Specifically, our proposed methods can be used for estimating general parameters of interest including population means and percentiles with imputation-based estimators, propensity score estimators, and doubly robust estimators. We compare those methods through some limited simulation studies and a real application. Both simulation studies and real application show the benefits of DL and XGboost approaches compared with other methods in terms of balancing bias and variance.

摘要

高维数据在实际应用中被视为最重要的大数据类型之一。它在包括基因研究、金融研究和地理研究等实际应用中经常出现。在高维数据分析中,缺失数据应得到妥善处理,以减少无应答偏差。我们讨论了一些现代机器学习技术,包括惩罚回归方法、基于树的方法和深度学习(DL),用于处理高维缺失数据。具体而言,我们提出的方法可用于估计一般感兴趣的参数,包括基于插补的估计器、倾向得分估计器和双重稳健估计器的总体均值和百分位数。我们通过一些有限的模拟研究和一个实际应用对这些方法进行了比较。模拟研究和实际应用均表明,与其他方法相比,DL和XGboost方法在平衡偏差和方差方面具有优势。

相似文献

8
Deep Learning Methods for Omics Data Imputation.用于组学数据插补的深度学习方法。
Biology (Basel). 2023 Oct 7;12(10):1313. doi: 10.3390/biology12101313.

本文引用的文献

3
Pseudo-population bootstrap methods for imputed survey data.用于插补调查数据的伪总体自助法。
Biometrika. 2019 Jun;106(2):369-384. doi: 10.1093/biomet/asz001. Epub 2019 Apr 3.
4
Multiple imputation with sequential penalized regression.多重插补与序贯惩罚回归。
Stat Methods Med Res. 2019 May;28(5):1311-1327. doi: 10.1177/0962280218755574. Epub 2018 Feb 16.
7
Deep learning.深度学习。
Nature. 2015 May 28;521(7553):436-44. doi: 10.1038/nature14539.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验