Briseño Sanchez Guillermo, Klein Nadja, Klinkhammer Hannah, Mayr Andreas
Methods for Big Data, Scientific Computing Center, Karlsruhe Institute of Technology, Karlsruhe, Germany.
Department of Medical Biometrics, Informatics and Epidemiology, Faculty of Medicine, University of Bonn, Bonn, Germany.
Stat Methods Med Res. 2025 May;34(5):887-902. doi: 10.1177/09622802241313294. Epub 2025 Mar 21.
Motivated by challenges in the analysis of biomedical data and observational studies, we develop statistical boosting for the general class of bivariate distributional copula regression with arbitrary marginal distributions, which is suited for binary, count, continuous or mixed outcomes. To arrive at a flexible model for the entire conditional distribution, not only the marginal distribution parameters but also the copula parameters are related to covariates through additive predictors. We suggest estimation by means of an adapted component-wise gradient boosting algorithm. A key benefit of boosting as opposed to classical likelihood or Bayesian estimation is the implicit data-driven variable selection mechanism as well as shrinkage. To the best of our knowledge, our implementation is the only one that combines a wide range of covariate effects, marginal distributions, copula functions, and implicit data-driven variable selection. We showcase the versatility of our approach to data from genetic epidemiology, healthcare utilization and childhood undernutrition. Our developments are implemented in the R package gamboostLSS, fostering transparent and reproducible research.
受生物医学数据和观察性研究分析中挑战的推动,我们针对具有任意边际分布的双变量分布Copula回归的一般类别开发了统计增强方法,该方法适用于二元、计数、连续或混合结果。为了得到一个适用于整个条件分布的灵活模型,不仅边际分布参数,而且Copula参数都通过加性预测变量与协变量相关。我们建议通过一种适配的逐分量梯度增强算法进行估计。与经典似然估计或贝叶斯估计相比,增强的一个关键优势是隐式数据驱动的变量选择机制以及收缩。据我们所知,我们的实现是唯一结合了广泛的协变量效应、边际分布、Copula函数和隐式数据驱动变量选择的实现。我们展示了我们的方法在遗传流行病学、医疗保健利用和儿童营养不良数据方面的通用性。我们的开发成果在R包gamboostLSS中实现,促进了透明且可重复的研究。