异质缺失情况下复杂抽样调查中的混合矩阵补全

Mixed Matrix Completion in Complex Survey Sampling under Heterogeneous Missingness.

作者信息

Mao Xiaojun, Wang Hengfang, Wang Zhonglei, Yang Shu

机构信息

School of Mathematical Sciences, Ministry of Education Key Laboratory of Scientific and Engineering Computing, Shanghai Jiao Tong University, Shanghai, 200240, China.

School of Mathematics and Statistics & Fujian Provincial Key Laboratory of Statistics and Artificial Intelligence, Fujian Normal University, Fujian 350007, China.

出版信息

J Comput Graph Stat. 2024;33(4):1320-1328. doi: 10.1080/10618600.2024.2319154. Epub 2024 Mar 29.

DOI:10.1080/10618600.2024.2319154

PMID:39720102

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11664600/

Abstract

Modern surveys with large sample sizes and growing mixed-type questionnaires require robust and scalable analysis methods. In this work, we consider recovering a mixed dataframe matrix, obtained by complex survey sampling, with entries following different canonical exponential distributions and subject to heterogeneous missingness. To tackle this challenging task, we propose a two-stage procedure: in the first stage, we model the entry-wise missing mechanism by logistic regression, and in the second stage, we complete the target parameter matrix by maximizing a weighted log-likelihood with a low-rank constraint. We propose a fast and scalable estimation algorithm that achieves sublinear convergence, and the upper bound for the estimation error of the proposed method is rigorously derived. Experimental results support our theoretical claims, and the proposed estimator shows its merits compared to other existing methods. The proposed method is applied to analyze the National Health and Nutrition Examination Survey data. Supplementary materialsfor this article are available online.

摘要

现代大规模抽样调查以及日益增多的混合型问卷需要强大且可扩展的分析方法。在这项工作中，我们考虑恢复一个通过复杂抽样调查获得的混合型数据框矩阵，其元素服从不同的标准指数分布且存在异质性缺失。为解决这一具有挑战性的任务，我们提出了一个两阶段程序：在第一阶段，我们通过逻辑回归对逐个元素的缺失机制进行建模；在第二阶段，我们通过最大化带有低秩约束的加权对数似然来完成目标参数矩阵。我们提出了一种实现次线性收敛的快速且可扩展的估计算法，并严格推导了所提方法估计误差的上界。实验结果支持了我们的理论主张，并且与其他现有方法相比，所提估计器展现出了其优势。所提方法被应用于分析美国国家健康与营养检查调查数据。本文的补充材料可在线获取。

相似文献

Mixed Matrix Completion in Complex Survey Sampling under Heterogeneous Missingness.异质缺失情况下复杂抽样调查中的混合矩阵补全

J Comput Graph Stat. 2024;33(4):1320-1328. doi: 10.1080/10618600.2024.2319154. Epub 2024 Mar 29.

Matrix completion under complex survey sampling.复杂抽样调查下的矩阵补全

Ann Inst Stat Math. 2023 Jun;75(3):463-492. doi: 10.1007/s10463-022-00851-5. Epub 2022 Sep 19.

Noisy Tensor Completion via Low-Rank Tensor Ring.基于低秩张量环的噪声张量补全

IEEE Trans Neural Netw Learn Syst. 2022 Jun 17;PP. doi: 10.1109/TNNLS.2022.3181378.

Empirical Likelihood in Nonignorable Covariate-Missing Data Problems.非ignorable协变量缺失数据问题中的经验似然

Int J Biostat. 2017 Apr 20;13(1):/j/ijb.2017.13.issue-1/ijb-2016-0053/ijb-2016-0053.xml. doi: 10.1515/ijb-2016-0053.

Fast Robust Matrix Completion via Entry-Wise ℓ-Norm Minimization.通过逐元素ℓ范数最小化实现快速鲁棒矩阵补全

IEEE Trans Cybern. 2023 Nov;53(11):7199-7212. doi: 10.1109/TCYB.2022.3224070. Epub 2023 Oct 17.

Sparse subspace clustering for data with missing entries and high-rank matrix completion.用于处理带有缺失值的数据的稀疏子空间聚类及高秩矩阵补全

Neural Netw. 2017 Sep;93:36-44. doi: 10.1016/j.neunet.2017.04.005. Epub 2017 Apr 25.

High-dimensional principal component analysis with heterogeneous missingness.具有异质缺失值的高维主成分分析

J R Stat Soc Series B Stat Methodol. 2022 Nov;84(5):2000-2031. doi: 10.1111/rssb.12550. Epub 2022 Nov 20.

Calibrated propensity score method for survey nonresponse in cluster sampling.整群抽样中调查无应答的校准倾向得分法

Biometrika. 2016 Jun;103(2):461-473. doi: 10.1093/biomet/asw004. Epub 2016 Mar 17.

Structured Matrix Completion with Applications to Genomic Data Integration.结构化矩阵补全及其在基因组数据整合中的应用

J Am Stat Assoc. 2016;111(514):621-633. doi: 10.1080/01621459.2015.1021005. Epub 2016 Aug 18.

Spatial Linear Mixed Models with Covariate Measurement Errors.具有协变量测量误差的空间线性混合模型

Stat Sin. 2009;19(3):1077-1093.

本文引用的文献

Matrix completion under complex survey sampling.复杂抽样调查下的矩阵补全

Ann Inst Stat Math. 2023 Jun;75(3):463-492. doi: 10.1007/s10463-022-00851-5. Epub 2022 Sep 19.

National Health and Nutrition Examination Survey, 2015-2018: Sample Design and Estimation Procedures.2015-2018 年全国健康与营养调查：样本设计和估计程序。

Vital Health Stat 2. 2020 Apr(184):1-35.

Multi-Label Nonlinear Matrix Completion With Transductive Multi-Task Feature Selection for Joint MGMT and IDH1 Status Prediction of Patient With High-Grade Gliomas.基于传递式多任务特征选择的多标签非线性矩阵补全在高级别胶质瘤患者 MGMT 和 IDH1 状态联合预测中的应用

IEEE Trans Med Imaging. 2018 Aug;37(8):1775-1787. doi: 10.1109/TMI.2018.2807590. Epub 2018 Feb 19.

A Nonconvex Optimization Framework for Low Rank Matrix Estimation.用于低秩矩阵估计的非凸优化框架。

Adv Neural Inf Process Syst. 2015;28:559-567.

Spectral Regularization Algorithms for Learning Large Incomplete Matrices.用于学习大型不完整矩阵的谱正则化算法

J Mach Learn Res. 2010 Mar 1;11:2287-2322.

The use of sampling weights for survey data analysis.调查数据分析中抽样权重的使用。

Stat Methods Med Res. 1996 Sep;5(3):239-61. doi: 10.1177/096228029600500303.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。