Olaniran Oyebayo Ridwan, Olaniran Saidat Fehintola, Allohibi Jeza, Alharbi Abdulmajeed Atiah, Alharbi Nada MohammedSaeed
Department of Statistics, Faculty of Physical Sciences, University of Ilorin, Ilorin, Kwara State, PMB 1515, Nigeria.
Department of Biostatistics & Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom.
Sci Rep. 2025 Aug 22;15(1):30927. doi: 10.1038/s41598-025-16526-z.
High-dimensional longitudinal data present significant analytical challenges due to intricate within-subject correlations and an overwhelming ratio of predictors to observations. To address these challenges, we introduce Mixed-Effect Gradient Boosting (MEGB), a novel R package that synergises gradient boosting with mixed-effects modelling to simultaneously account for population-level fixed effects and subject-specific random variability. MEGB provides a unified framework for analysing repeated measures data that accommodates complex covariance structures while harnessing gradient boosting's inherent regularisation for robust feature selection and prediction. In comprehensive simulations spanning linear and nonlinear data-generating processes, MEGB achieved 35-76% lower mean squared error (MSE) compared to state-of-the-art alternatives like Mixed-Effect Random Forests (MERF) and REEMForest, while maintaining 55-70% true positive rates for variable selection in ultra-high-dimensional regimes . Demonstrating practical utility, we applied MEGB to maternal cell-free plasma RNA data subjects, transcripts), where it identified 9 key placental transcripts driving fetal RNA dynamics across pregnancy trimesters.
高维纵向数据由于复杂的个体内相关性以及预测变量与观测值的压倒性比例而带来了重大的分析挑战。为应对这些挑战,我们引入了混合效应梯度提升(MEGB),这是一个新颖的R包,它将梯度提升与混合效应建模相结合,以同时考虑总体水平的固定效应和个体特定的随机变异性。MEGB为分析重复测量数据提供了一个统一的框架,该框架在利用梯度提升固有的正则化进行稳健的特征选择和预测的同时,还能适应复杂的协方差结构。在涵盖线性和非线性数据生成过程的全面模拟中,与混合效应随机森林(MERF)和REEMForest等现有最佳替代方法相比,MEGB的均方误差(MSE)降低了35 - 76%,同时在超高维情况下变量选择的真阳性率保持在55 - 70%。为证明其实用性,我们将MEGB应用于母体游离血浆RNA数据(受试者,转录本),它在其中识别出了9个驱动整个孕期胎儿RNA动态变化的关键胎盘转录本。