Suppr超能文献

FedGMMAT:联邦广义线性混合模型关联测试。

FedGMMAT: Federated generalized linear mixed model association tests.

机构信息

McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, United States of America.

School of Public Health, University of Texas Health Science Center at Houston, Houston, Texas, United States of America.

出版信息

PLoS Comput Biol. 2024 Jul 24;20(7):e1012142. doi: 10.1371/journal.pcbi.1012142. eCollection 2024 Jul.

Abstract

Increasing genetic and phenotypic data size is critical for understanding the genetic determinants of diseases. Evidently, establishing practical means for collaboration and data sharing among institutions is a fundamental methodological barrier for performing high-powered studies. As the sample sizes become more heterogeneous, complex statistical approaches, such as generalized linear mixed effects models, must be used to correct for the confounders that may bias results. On another front, due to the privacy concerns around Protected Health Information (PHI), genetic information is restrictively protected by sharing according to regulations such as Health Insurance Portability and Accountability Act (HIPAA). This limits data sharing among institutions and hampers efforts around executing high-powered collaborative studies. Federated approaches are promising to alleviate the issues around privacy and performance, since sensitive data never leaves the local sites. Motivated by these, we developed FedGMMAT, a federated genetic association testing tool that utilizes a federated statistical testing approach for efficient association tests that can correct for confounding fixed and additive polygenic random effects among different collaborating sites. Genetic data is never shared among collaborating sites, and the intermediate statistics are protected by encryption. Using simulated and real datasets, we demonstrate FedGMMAT can achieve the virtually same results as pooled analysis under a privacy-preserving framework with practical resource requirements.

摘要

随着基因和表型数据规模的不断增加,理解疾病的遗传决定因素变得至关重要。显然,建立机构之间的合作和数据共享的实际手段是进行高影响力研究的基本方法障碍。随着样本大小变得更加异质,必须使用复杂的统计方法,如广义线性混合效应模型,来纠正可能导致结果偏差的混杂因素。另一方面,由于受保护的健康信息 (PHI) 隐私问题,根据《健康保险携带和责任法案》(HIPAA)等法规,遗传信息受到严格保护,只能按照规定进行共享。这限制了机构之间的数据共享,并阻碍了围绕执行高影响力的合作研究的努力。联合方法有望缓解隐私和性能方面的问题,因为敏感数据从未离开过本地站点。受此启发,我们开发了 FedGMMAT,这是一种联邦遗传关联测试工具,它利用联邦统计测试方法来进行有效的关联测试,可以纠正不同合作站点之间混杂的固定和加性多基因随机效应。在保护隐私的框架下,遗传数据不会在合作站点之间共享,中间统计数据受到加密保护。使用模拟和真实数据集,我们证明 FedGMMAT 可以在实际资源要求下实现与汇总分析几乎相同的结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4591/11299833/2a85f3f3a7da/pcbi.1012142.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验