Department of Statistics and Institute of Data Science, National Cheng Kung University, Tainan, Taiwan.
Department of Statistics, Sungkyunkwan University, Seoul, South Korea.
Stat Med. 2024 Apr 15;43(8):1527-1548. doi: 10.1002/sim.10029. Epub 2024 Feb 6.
When analyzing multivariate longitudinal binary data, we estimate the effects on the responses of the covariates while accounting for three types of complex correlations present in the data. These include the correlations within separate responses over time, cross-correlations between different responses at different times, and correlations between different responses at each time point. The number of parameters thus increases quadratically with the dimension of the correlation matrix, making parameter estimation difficult; the estimated correlation matrix must also meet the positive definiteness constraint. The correlation matrix may additionally be heteroscedastic; however, the matrix structure is commonly considered to be homoscedastic and constrained, such as exchangeable or autoregressive with order one. These assumptions are overly strong, resulting in skewed estimates of the covariate effects on the responses. Hence, we propose probit linear mixed models for multivariate longitudinal binary data, where the correlation matrix is estimated using hypersphere decomposition instead of the strong assumptions noted above. Simulations and real examples are used to demonstrate the proposed methods. An open source R package, BayesMGLM, is made available on GitHub at https://github.com/kuojunglee/BayesMGLM/ with full documentation to produce the results.
当分析多元纵向二分类数据时,我们需要估计协变量对响应的影响,同时考虑数据中存在的三种复杂相关性。这些相关性包括随时间变化的不同响应之间的相关性、不同时间点的不同响应之间的交叉相关性以及每个时间点的不同响应之间的相关性。因此,参数的数量随相关矩阵维度的平方而增加,使得参数估计变得困难;估计的相关矩阵还必须满足正定约束。相关矩阵可能还存在异方差性;然而,通常认为矩阵结构是同方差和约束的,例如可交换或自回归一阶。这些假设过于严格,导致对协变量对响应的影响的估计存在偏斜。因此,我们提出了用于多元纵向二分类数据的概率线性混合模型,其中使用超球分解而不是上述强假设来估计相关矩阵。模拟和实际示例用于演示所提出的方法。在 https://github.com/kuojunglee/BayesMGLM/ 上的 GitHub 上提供了一个开源 R 包 BayesMGLM,并提供了完整的文档来生成结果。