用于不完全观测的张量回归及其在纵向研究中的应用

TENSOR REGRESSION FOR INCOMPLETE OBSERVATIONS WITH APPLICATION TO LONGITUDINAL STUDIES.

作者信息

Xu Tianchen, Chen Kun, Li Gen

机构信息

Bristol Myers Squibb.

Department of Statistics, University of Connecticut.

出版信息

Ann Appl Stat. 2024 Jun;18(2):1195-1212. doi: 10.1214/23-aoas1830. Epub 2024 Apr 5.

DOI:10.1214/23-aoas1830

PMID:39360180

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11446469/

Abstract

Multivariate longitudinal data are frequently encountered in practice such as in our motivating longitudinal microbiome study. It is of general interest to associate such high-dimensional, longitudinal measures with some univariate continuous outcome. However, incomplete observations are common in a regular study design, as not all samples are measured at every time point, giving rise to the so-called blockwise missing values. Such missing structure imposes significant challenges for association analysis and defies many existing methods that require complete samples. In this paper we propose to represent multivariate longitudinal data as a three-way tensor array (i.e., sample-by-feature-by-time) and exploit a parsimonious scalar-on-tensor regression model for association analysis. We develop a regularized covariance-based estimation procedure that effectively leverages all available observations without imputation. The method achieves variable selection and smooth estimation of time-varying effects. The application to the motivating microbiome study reveals interesting links between the preterm infant's gut microbiome dynamics and their neurodevelopment. Additional numerical studies on synthetic data and a longitudinal aging study further demonstrate the efficacy of the proposed method.

摘要

多变量纵向数据在实际中经常遇到，比如在我们具有启发性的纵向微生物组研究中。将这种高维纵向测量与某个单变量连续结果联系起来是普遍感兴趣的问题。然而，在常规研究设计中不完整观测很常见，因为并非所有样本都在每个时间点进行测量，从而产生了所谓的逐块缺失值。这种缺失结构给关联分析带来了重大挑战，并且使许多需要完整样本的现有方法失效。在本文中，我们建议将多变量纵向数据表示为一个三维张量数组（即样本 - 特征 - 时间），并利用一个简约的标量对张量回归模型进行关联分析。我们开发了一种基于正则化协方差的估计程序，该程序无需插补就能有效利用所有可用观测值。该方法实现了变量选择和时变效应的平滑估计。应用于具有启发性的微生物组研究揭示了早产儿肠道微生物组动态与其神经发育之间有趣的联系。对合成数据和一项纵向衰老研究的额外数值研究进一步证明了所提方法的有效性。

相似文献

TENSOR REGRESSION FOR INCOMPLETE OBSERVATIONS WITH APPLICATION TO LONGITUDINAL STUDIES.用于不完全观测的张量回归及其在纵向研究中的应用

Ann Appl Stat. 2024 Jun;18(2):1195-1212. doi: 10.1214/23-aoas1830. Epub 2024 Apr 5.

Review and evaluation of imputation methods for multivariate longitudinal data with mixed-type incomplete variables.多元纵向混合缺失数据插补方法的评价与研究

Stat Med. 2022 Dec 30;41(30):5844-5876. doi: 10.1002/sim.9592. Epub 2022 Oct 11.

Multiple imputation methods for handling incomplete longitudinal and clustered data where the target analysis is a linear mixed effects model.用于处理目标分析为线性混合效应模型的不完全纵向和聚类数据的多重填补方法。

Biom J. 2020 Mar;62(2):444-466. doi: 10.1002/bimj.201900051. Epub 2020 Jan 9.

Efficient Semiparametric Regression for Longitudinal Data with Regularized Estimation of Error Covariance Function.具有误差协方差函数正则化估计的纵向数据的高效半参数回归

J Nonparametr Stat. 2019;31(4):867-886. doi: 10.1080/10485252.2019.1651853. Epub 2019 Aug 8.

LOG-CONTRAST REGRESSION WITH FUNCTIONAL COMPOSITIONAL PREDICTORS: LINKING PRETERM INFANT'S GUT MICROBIOME TRAJECTORIES TO NEUROBEHAVIORAL OUTCOME.具有功能组成预测因子的对数对比度回归：将早产婴儿的肠道微生物群轨迹与神经行为结果联系起来。

Ann Appl Stat. 2020 Sep;14(3):1535-1556. doi: 10.1214/20-aoas1357. Epub 2020 Sep 18.

A comparison of multiple imputation methods for handling missing values in longitudinal data in the presence of a time-varying covariate with a non-linear association with time: a simulation study.存在与时间呈非线性关联的时变协变量时，用于处理纵向数据中缺失值的多种多重填补方法的比较：一项模拟研究。

BMC Med Res Methodol. 2017 Jul 25;17(1):114. doi: 10.1186/s12874-017-0372-y.

Interpolation of microbiome composition in longitudinal data sets.纵向数据集的微生物组组成内插。

mBio. 2024 Sep 11;15(9):e0115024. doi: 10.1128/mbio.01150-24. Epub 2024 Aug 20.

Multiple imputation methods for handling missing values in a longitudinal categorical variable with restrictions on transitions over time: a simulation study.多种插补方法处理具有时间过渡限制的纵向分类变量中的缺失值：一项模拟研究。

BMC Med Res Methodol. 2019 Jan 10;19(1):14. doi: 10.1186/s12874-018-0653-0.

Attrition in longitudinal studies. How to deal with missing data.纵向研究中的失访。如何处理缺失数据。

J Clin Epidemiol. 2002 Apr;55(4):329-37. doi: 10.1016/s0895-4356(01)00476-0.

Minimax Rate-optimal Estimation of High-dimensional Covariance Matrices with Incomplete Data.具有不完全数据的高维协方差矩阵的极小极大速率最优估计

J Multivar Anal. 2016 Sep;150:55-74. doi: 10.1016/j.jmva.2016.05.002. Epub 2016 May 19.

本文引用的文献

Guaranteed Functional Tensor Singular Value Decomposition.保证功能张量奇异值分解

J Am Stat Assoc. 2024;119(546):995-1007. doi: 10.1080/01621459.2022.2153689. Epub 2023 Feb 6.

EMBED: Essential MicroBiomE Dynamics, a dimensionality reduction approach for longitudinal microbiome studies.嵌入式：基本微生物组动力学，一种用于纵向微生物组研究的降维方法。

NPJ Syst Biol Appl. 2023 Jun 20;9(1):26. doi: 10.1038/s41540-023-00285-6.

The more data, the better? Demystifying deletion-based methods in linear regression with missing data.数据越多越好？解读线性回归中针对缺失数据的基于删除的方法

Stat Interface. 2022;15(4):515-526. doi: 10.4310/21-sii717. Epub 2022 Mar 4.

Optimal Sparse Linear Prediction for Block-missing Multi-modality Data without Imputation.无插补的块缺失多模态数据的最优稀疏线性预测

J Am Stat Assoc. 2020;115(531):1406-1419. doi: 10.1080/01621459.2019.1632079. Epub 2019 Jul 22.

Tucker Tensor Regression and Neuroimaging Analysis.塔克张量回归与神经影像分析

Stat Biosci. 2018 Dec;10(3):520-545. doi: 10.1007/s12561-018-9215-6. Epub 2018 Mar 7.

Fast Covariance Estimation for Multivariate Sparse Functional Data.多元稀疏函数数据的快速协方差估计

Stat (Int Stat Inst). 2020;9(1). doi: 10.1002/sta4.245. Epub 2020 Jun 17.

Ann Appl Stat. 2020 Sep;14(3):1535-1556. doi: 10.1214/20-aoas1357. Epub 2020 Sep 18.

Bacteroides-dominant gut microbiome of late infancy is associated with enhanced neurodevelopment.婴儿后期以拟杆菌为主的肠道微生物群与神经发育增强有关。

Gut Microbes. 2021 Jan-Dec;13(1):1-17. doi: 10.1080/19490976.2021.1930875.

Pantoea Infections in the Neonatal Intensive Care Unit.新生儿重症监护病房中的泛菌感染

Cureus. 2021 Feb 3;13(2):e13103. doi: 10.7759/cureus.13103.

Context-aware dimensionality reduction deconvolutes gut microbial community dynamics.上下文感知降维可剖析肠道微生物群落动态。

Nat Biotechnol. 2021 Feb;39(2):165-168. doi: 10.1038/s41587-020-0660-7. Epub 2020 Aug 31.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验