Suppr超能文献

用于高维纵向数据的混合效应梯度提升法

Mixed effect gradient boosting for high-dimensional longitudinal data.

作者信息

Olaniran Oyebayo Ridwan, Olaniran Saidat Fehintola, Allohibi Jeza, Alharbi Abdulmajeed Atiah, Alharbi Nada MohammedSaeed

机构信息

Department of Statistics, Faculty of Physical Sciences, University of Ilorin, Ilorin, Kwara State, PMB 1515, Nigeria.

Department of Biostatistics & Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom.

出版信息

Sci Rep. 2025 Aug 22;15(1):30927. doi: 10.1038/s41598-025-16526-z.

Abstract

High-dimensional longitudinal data present significant analytical challenges due to intricate within-subject correlations and an overwhelming ratio of predictors to observations. To address these challenges, we introduce Mixed-Effect Gradient Boosting (MEGB), a novel R package that synergises gradient boosting with mixed-effects modelling to simultaneously account for population-level fixed effects and subject-specific random variability. MEGB provides a unified framework for analysing repeated measures data that accommodates complex covariance structures while harnessing gradient boosting's inherent regularisation for robust feature selection and prediction. In comprehensive simulations spanning linear and nonlinear data-generating processes, MEGB achieved 35-76% lower mean squared error (MSE) compared to state-of-the-art alternatives like Mixed-Effect Random Forests (MERF) and REEMForest, while maintaining 55-70% true positive rates for variable selection in ultra-high-dimensional regimes . Demonstrating practical utility, we applied MEGB to maternal cell-free plasma RNA data subjects, transcripts), where it identified 9 key placental transcripts driving fetal RNA dynamics across pregnancy trimesters.

摘要

高维纵向数据由于复杂的个体内相关性以及预测变量与观测值的压倒性比例而带来了重大的分析挑战。为应对这些挑战,我们引入了混合效应梯度提升(MEGB),这是一个新颖的R包,它将梯度提升与混合效应建模相结合,以同时考虑总体水平的固定效应和个体特定的随机变异性。MEGB为分析重复测量数据提供了一个统一的框架,该框架在利用梯度提升固有的正则化进行稳健的特征选择和预测的同时,还能适应复杂的协方差结构。在涵盖线性和非线性数据生成过程的全面模拟中,与混合效应随机森林(MERF)和REEMForest等现有最佳替代方法相比,MEGB的均方误差(MSE)降低了35 - 76%,同时在超高维情况下变量选择的真阳性率保持在55 - 70%。为证明其实用性,我们将MEGB应用于母体游离血浆RNA数据(受试者,转录本),它在其中识别出了9个驱动整个孕期胎儿RNA动态变化的关键胎盘转录本。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验