Xu Tianchen, Chen Kun, Li Gen
Bristol Myers Squibb.
Department of Statistics, University of Connecticut.
Ann Appl Stat. 2024 Jun;18(2):1195-1212. doi: 10.1214/23-aoas1830. Epub 2024 Apr 5.
Multivariate longitudinal data are frequently encountered in practice such as in our motivating longitudinal microbiome study. It is of general interest to associate such high-dimensional, longitudinal measures with some univariate continuous outcome. However, incomplete observations are common in a regular study design, as not all samples are measured at every time point, giving rise to the so-called blockwise missing values. Such missing structure imposes significant challenges for association analysis and defies many existing methods that require complete samples. In this paper we propose to represent multivariate longitudinal data as a three-way tensor array (i.e., sample-by-feature-by-time) and exploit a parsimonious scalar-on-tensor regression model for association analysis. We develop a regularized covariance-based estimation procedure that effectively leverages all available observations without imputation. The method achieves variable selection and smooth estimation of time-varying effects. The application to the motivating microbiome study reveals interesting links between the preterm infant's gut microbiome dynamics and their neurodevelopment. Additional numerical studies on synthetic data and a longitudinal aging study further demonstrate the efficacy of the proposed method.
多变量纵向数据在实际中经常遇到,比如在我们具有启发性的纵向微生物组研究中。将这种高维纵向测量与某个单变量连续结果联系起来是普遍感兴趣的问题。然而,在常规研究设计中不完整观测很常见,因为并非所有样本都在每个时间点进行测量,从而产生了所谓的逐块缺失值。这种缺失结构给关联分析带来了重大挑战,并且使许多需要完整样本的现有方法失效。在本文中,我们建议将多变量纵向数据表示为一个三维张量数组(即样本 - 特征 - 时间),并利用一个简约的标量对张量回归模型进行关联分析。我们开发了一种基于正则化协方差的估计程序,该程序无需插补就能有效利用所有可用观测值。该方法实现了变量选择和时变效应的平滑估计。应用于具有启发性的微生物组研究揭示了早产儿肠道微生物组动态与其神经发育之间有趣的联系。对合成数据和一项纵向衰老研究的额外数值研究进一步证明了所提方法的有效性。