Suppr超能文献

基于分布式数据源水平划分数据的广义混合效应模型(GLMM)的联邦学习算法。

Federated learning algorithms for generalized mixed-effects model (GLMM) on horizontally partitioned data from distributed sources.

机构信息

School of Biomedical Informatics, UTHealth, 7000 Fannin St, Houston, 77030, TX, USA.

Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, 3400 Civic Center Boulevard, Philadelphia, 19104, PA, USA.

出版信息

BMC Med Inform Decis Mak. 2022 Oct 16;22(1):269. doi: 10.1186/s12911-022-02014-1.

Abstract

OBJECTIVES

This paper developed federated solutions based on two approximation algorithms to achieve federated generalized linear mixed effect models (GLMM). The paper also proposed a solution for numerical errors and singularity issues. And showed the two proposed methods can perform well in revealing the significance of parameter in distributed datasets, comparing to a centralized GLMM algorithm from R package ('lme4') as the baseline model.

METHODS

The log-likelihood function of GLMM is approximated by two numerical methods (Laplace approximation and Gaussian Hermite approximation, abbreviated as LA and GH), which supports federated decomposition of GLMM to bring computation to data. To solve the numerical errors and singularity issues, the loss-less estimation of log-sum-exponential trick and the adaptive regularization strategy was used to tackle the problems caused by federated settings.

RESULTS

Our proposed method can handle GLMM to accommodate hierarchical data with multiple non-independent levels of observations in a federated setting. The experiment results demonstrate comparable (LA) and superior (GH) performances with simulated and real-world data.

CONCLUSION

We modified and compared federated GLMMs with different approximations, which can support researchers in analyzing versatile biomedical data to accommodate mixed effects and address non-independence due to hierarchical structures (i.e., institutes, region, country, etc.).

摘要

目的

本文基于两种近似算法开发了联邦解决方案,以实现联邦广义线性混合效应模型(GLMM)。本文还提出了解决数值误差和奇异问题的方法。并表明,与集中式 GLMM 算法(来自 R 包“lme4”)作为基准模型相比,这两种方法在揭示分布式数据集参数的重要性方面表现良好。

方法

通过两种数值方法(拉普拉斯近似和高斯-赫尔墨斯近似,简称 LA 和 GH)对 GLMM 的对数似然函数进行近似,这支持了 GLMM 的联邦分解,从而将计算带到数据中。为了解决数值误差和奇异问题,使用无损失对数和指数求和估计技巧和自适应正则化策略来解决联邦设置引起的问题。

结果

我们提出的方法可以处理 GLMM,以适应具有多个非独立观测层次的层次数据的联邦设置。实验结果表明,模拟数据和真实世界数据的性能相当(LA)和优越(GH)。

结论

我们修改并比较了具有不同近似值的联邦 GLMM,可以支持研究人员分析各种生物医学数据,以适应混合效应,并解决由于层次结构(即机构、地区、国家等)引起的非独立性问题。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7cd6/9571462/2d2b3849a4c2/12911_2022_2014_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验