Department of Methodology and Statistics, Tilburg University, Tilburg, The Netherlands.
Department of Theory, Methodology, and Statistics, Open University of the Netherlands, Heerlen, The Netherlands.
BMC Med Res Methodol. 2023 Oct 5;23(1):220. doi: 10.1186/s12874-023-02034-z.
In medical, social, and behavioral research we often encounter datasets with a multilevel structure and multiple correlated dependent variables. These data are frequently collected from a study population that distinguishes several subpopulations with different (i.e., heterogeneous) effects of an intervention. Despite the frequent occurrence of such data, methods to analyze them are less common and researchers often resort to either ignoring the multilevel and/or heterogeneous structure, analyzing only a single dependent variable, or a combination of these. These analysis strategies are suboptimal: Ignoring multilevel structures inflates Type I error rates, while neglecting the multivariate or heterogeneous structure masks detailed insights.
To analyze such data comprehensively, the current paper presents a novel Bayesian multilevel multivariate logistic regression model. The clustered structure of multilevel data is taken into account, such that posterior inferences can be made with accurate error rates. Further, the model shares information between different subpopulations in the estimation of average and conditional average multivariate treatment effects. To facilitate interpretation, multivariate logistic regression parameters are transformed to posterior success probabilities and differences between them.
A numerical evaluation compared our framework to less comprehensive alternatives and highlighted the need to model the multilevel structure: Treatment comparisons based on the multilevel model had targeted Type I error rates, while single-level alternatives resulted in inflated Type I errors. Further, the multilevel model was more powerful than a single-level model when the number of clusters was higher. A re-analysis of the Third International Stroke Trial data illustrated how incorporating a multilevel structure, assessing treatment heterogeneity, and combining dependent variables contributed to an in-depth understanding of treatment effects. Further, we demonstrated how Bayes factors can aid in the selection of a suitable model.
The method is useful in prediction of treatment effects and decision-making within subpopulations from multiple clusters, while taking advantage of the size of the entire study sample and while properly incorporating the uncertainty in a principled probabilistic manner using the full posterior distribution.
在医学、社会和行为研究中,我们经常遇到具有多层次结构和多个相关因变量的数据集。这些数据通常是从具有不同(即异质)干预效果的研究人群中收集的。尽管这种数据经常出现,但分析它们的方法却不常见,研究人员通常要么忽略多层次和/或异质结构,要么只分析一个单一的因变量,要么两者兼而有之。这些分析策略并不理想:忽略多层次结构会导致 I 型错误率膨胀,而忽略多变量或异质结构会掩盖详细的见解。
为了全面分析此类数据,本文提出了一种新的贝叶斯多层次多变量逻辑回归模型。该模型考虑了多层次数据的聚类结构,以便能够以准确的误差率进行后验推断。此外,该模型在估计平均和条件平均多变量治疗效果时,在不同子群体之间共享信息。为了便于解释,将多元逻辑回归参数转换为后验成功概率及其差异。
数值评估将我们的框架与不太全面的替代方案进行了比较,并强调了对多层次结构进行建模的必要性:基于多层次模型的治疗比较具有靶向 I 型错误率,而单层次替代方案则导致 I 型错误率膨胀。此外,当聚类数较高时,多层次模型比单层次模型更有效。对第三次国际中风试验数据的重新分析说明了如何纳入多层次结构、评估治疗异质性以及组合因变量,有助于深入了解治疗效果。此外,我们还展示了贝叶斯因子如何有助于选择合适的模型。
该方法在预测多聚类的多个子群体中的治疗效果和决策方面非常有用,同时利用整个研究样本的大小,并以有原则的概率方式利用完整的后验分布,正确地纳入不确定性。