整合多个纵向研究时针对逐块缺失数据的基于插补的变量选择方法

Imputation-Based Variable Selection Method for Block-Wise Missing Data When Integrating Multiple Longitudinal Studies.

作者信息

Ouyang Zhongzhe, Wang Lu

机构信息

Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA.

出版信息

Mathematics (Basel). 2024 Apr;12(7). doi: 10.3390/math12070951. Epub 2024 Mar 23.

DOI:10.3390/math12070951

PMID:39925461

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11804884/

Abstract

When integrating data from multiple sources, a common challenge is block-wise missing. Most existing methods address this issue only in cross-sectional studies. In this paper, we propose a method for variable selection when combining datasets from multiple sources in longitudinal studies. To account for block-wise missing in covariates, we impute the missing values multiple times based on combinations of samples from different missing pattern and predictors from different data sources. We then use these imputed data to construct estimating equations, and aggregate the information across subjects and sources with the generalized method of moments. We employ the smoothly clipped absolute deviation penalty in variable selection and use the extended Bayesian Information Criterion criteria for tuning parameter selection. We establish the asymptotic properties of the proposed estimator, and demonstrate the superior performance of the proposed method through numerical experiments. Furthermore, we apply the proposed method in the Alzheimer's Disease Neuroimaging Initiative study to identify sensitive early-stage biomarkers of Alzheimer's Disease, which is crucial for early disease detection and personalized treatment.

摘要

在整合来自多个来源的数据时，一个常见的挑战是分块缺失。大多数现有方法仅在横断面研究中解决这个问题。在本文中，我们提出了一种在纵向研究中合并多个来源数据集时进行变量选择的方法。为了解决协变量中的分块缺失问题，我们基于来自不同缺失模式的样本组合和来自不同数据源的预测变量多次插补缺失值。然后，我们使用这些插补数据构建估计方程，并通过广义矩方法汇总跨个体和数据源的信息。我们在变量选择中采用平滑截断绝对偏差惩罚，并使用扩展贝叶斯信息准则进行调优参数选择。我们建立了所提出估计量的渐近性质，并通过数值实验证明了所提出方法的优越性能。此外，我们将所提出的方法应用于阿尔茨海默病神经影像学倡议研究中，以识别阿尔茨海默病的敏感早期生物标志物，这对于疾病的早期检测和个性化治疗至关重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b67d/11804884/68ff802e1f1b/nihms-2039846-f0001.jpg

相似文献

Imputation-Based Variable Selection Method for Block-Wise Missing Data When Integrating Multiple Longitudinal Studies.

Mathematics (Basel). 2024 Apr;12(7). doi: 10.3390/math12070951. Epub 2024 Mar 23.

Multinomial Logistic Factor Regression for Multi-source Functional Block-wise Missing Data.

Psychometrika. 2023 Sep;88(3):975-1001. doi: 10.1007/s11336-023-09918-5. Epub 2023 Jun 2.

Addressing the missing data challenge in multi-modal datasets for the diagnosis of Alzheimer's disease.

J Neurosci Methods. 2022 Jun 1;375:109582. doi: 10.1016/j.jneumeth.2022.109582. Epub 2022 Mar 26.

Model selection of generalized estimating equations with multiply imputed longitudinal data.

Biom J. 2013 Nov;55(6):899-911. doi: 10.1002/bimj.201200236. Epub 2013 Aug 23.

Variable selection for case-cohort studies with failure time outcome.

Biometrika. 2016 Sep;103(3):547-562. doi: 10.1093/biomet/asw027. Epub 2016 Aug 10.

Generalized integrative principal component analysis for multi-type data with block-wise missing structure.

Biostatistics. 2020 Apr 1;21(2):302-318. doi: 10.1093/biostatistics/kxy052.

VARIABLE SELECTION FOR REGRESSION MODELS WITH MISSING DATA.

Stat Sin. 2010 Jan;20(1):149-165.

Bi-level multi-source learning for heterogeneous block-wise missing data.

Neuroimage. 2014 Nov 15;102 Pt 1:192-206. doi: 10.1016/j.neuroimage.2013.08.015. Epub 2013 Aug 27.

Analyzing evidence-based falls prevention data with significant missing information using variable selection after multiple imputation.

J Appl Stat. 2021 Oct 7;50(3):724-743. doi: 10.1080/02664763.2021.1985090. eCollection 2023.

Multiple imputation with sequential penalized regression.

Stat Methods Med Res. 2019 May;28(5):1311-1327. doi: 10.1177/0962280218755574. Epub 2018 Feb 16.

本文引用的文献

Optimal Sparse Linear Prediction for Block-missing Multi-modality Data without Imputation.

J Am Stat Assoc. 2020;115(531):1406-1419. doi: 10.1080/01621459.2019.1632079. Epub 2019 Jul 22.

Diagnostic Classification and Biomarker Identification of Alzheimer's Disease with Random Forest Algorithm.

Brain Sci. 2021 Apr 2;11(4):453. doi: 10.3390/brainsci11040453.

Brain volumes and their ratios in Alzheimer´s disease on magnetic resonance imaging segmented using Freesurfer 6.0.

Psychiatry Res Neuroimaging. 2019 May 30;287:70-74. doi: 10.1016/j.pscychresns.2019.01.014. Epub 2019 Mar 19.

Accounting for missing data in statistical analyses: multiple imputation is not always the answer.

Int J Epidemiol. 2019 Aug 1;48(4):1294-1304. doi: 10.1093/ije/dyz032.

Generalized integrative principal component analysis for multi-type data with block-wise missing structure.

Biostatistics. 2020 Apr 1;21(2):302-318. doi: 10.1093/biostatistics/kxy052.

Graphical and numerical diagnostic tools to assess suitability of multiple imputations and imputation models.

Stat Med. 2016 Jul 30;35(17):3007-20. doi: 10.1002/sim.6926. Epub 2016 Mar 7.

Neurodegenerative disease diagnosis using incomplete multi-modality data via matrix shrinkage and completion.

Neuroimage. 2014 May 1;91:386-400. doi: 10.1016/j.neuroimage.2014.01.033. Epub 2014 Jan 27.

EMLasso: logistic lasso with missing data.

Stat Med. 2013 Aug 15;32(18):3143-57. doi: 10.1002/sim.5760. Epub 2013 Feb 25.

Multi-source feature learning for joint analysis of incomplete multiple heterogeneous neuroimaging data.

Neuroimage. 2012 Jul 2;61(3):622-32. doi: 10.1016/j.neuroimage.2012.03.059. Epub 2012 Mar 29.

Sample size determination for quadratic inference functions in longitudinal design with dichotomous outcomes.

Stat Med. 2012 Apr 13;31(8):787-800. doi: 10.1002/sim.4458. Epub 2012 Feb 23.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

整合多个纵向研究时针对逐块缺失数据的基于插补的变量选择方法

Imputation-Based Variable Selection Method for Block-Wise Missing Data When Integrating Multiple Longitudinal Studies.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献