• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

联邦学习在基因组数据上的功效:对英国生物银行和千人基因组计划的一项研究。

Efficacy of federated learning on genomic data: a study on the UK Biobank and the 1000 Genomes Project.

作者信息

Kolobkov Dmitry, Mishra Sharma Satyarth, Medvedev Aleksandr, Lebedev Mikhail, Kosaretskiy Egor, Vakhitov Ruslan

机构信息

GENXT, Hinxton, United Kingdom.

Laboratory of Ecological Genetics, Vavilov Institute of General Genetics, Moscow, Russia.

出版信息

Front Big Data. 2024 Feb 29;7:1266031. doi: 10.3389/fdata.2024.1266031. eCollection 2024.

DOI:10.3389/fdata.2024.1266031
PMID:38487517
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10937521/
Abstract

Combining training data from multiple sources increases sample size and reduces confounding, leading to more accurate and less biased machine learning models. In healthcare, however, direct pooling of data is often not allowed by data custodians who are accountable for minimizing the exposure of sensitive information. Federated learning offers a promising solution to this problem by training a model in a decentralized manner thus reducing the risks of data leakage. Although there is increasing utilization of federated learning on clinical data, its efficacy on individual-level genomic data has not been studied. This study lays the groundwork for the adoption of federated learning for genomic data by investigating its applicability in two scenarios: phenotype prediction on the UK Biobank data and ancestry prediction on the 1000 Genomes Project data. We show that federated models trained on data split into independent nodes achieve performance close to centralized models, even in the presence of significant inter-node heterogeneity. Additionally, we investigate how federated model accuracy is affected by communication frequency and suggest approaches to reduce computational complexity or communication costs.

摘要

合并来自多个来源的训练数据可以增加样本量并减少混杂因素,从而产生更准确且偏差更小的机器学习模型。然而,在医疗保健领域,数据保管人通常不允许直接合并数据,因为他们负责尽量减少敏感信息的暴露。联邦学习通过以分散方式训练模型,为解决此问题提供了一个有前景的解决方案,从而降低了数据泄露的风险。尽管联邦学习在临床数据上的应用越来越多,但其在个体水平基因组数据上的功效尚未得到研究。本研究通过调查联邦学习在两种场景中的适用性,为其在基因组数据中的应用奠定了基础:对英国生物银行数据进行表型预测以及对千人基因组计划数据进行血统预测。我们表明,即使在存在显著节点间异质性的情况下,在拆分为独立节点的数据上训练的联邦模型也能达到接近集中式模型的性能。此外,我们研究了通信频率如何影响联邦模型的准确性,并提出了降低计算复杂度或通信成本的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91db/10937521/1a0783079b51/fdata-07-1266031-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91db/10937521/332ae537d94d/fdata-07-1266031-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91db/10937521/cc614efbdcab/fdata-07-1266031-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91db/10937521/87feb7e77297/fdata-07-1266031-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91db/10937521/1546557e8362/fdata-07-1266031-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91db/10937521/f0a62fa68a1c/fdata-07-1266031-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91db/10937521/1a0783079b51/fdata-07-1266031-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91db/10937521/332ae537d94d/fdata-07-1266031-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91db/10937521/cc614efbdcab/fdata-07-1266031-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91db/10937521/87feb7e77297/fdata-07-1266031-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91db/10937521/1546557e8362/fdata-07-1266031-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91db/10937521/f0a62fa68a1c/fdata-07-1266031-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91db/10937521/1a0783079b51/fdata-07-1266031-g0006.jpg

相似文献

1
Efficacy of federated learning on genomic data: a study on the UK Biobank and the 1000 Genomes Project.联邦学习在基因组数据上的功效:对英国生物银行和千人基因组计划的一项研究。
Front Big Data. 2024 Feb 29;7:1266031. doi: 10.3389/fdata.2024.1266031. eCollection 2024.
2
Evaluating Federated Learning Simulators: A Comparative Analysis of Horizontal and Vertical Approaches.评估联邦学习模拟器:水平方法与垂直方法的比较分析
Sensors (Basel). 2024 Aug 9;24(16):5149. doi: 10.3390/s24165149.
3
Facing small and biased data dilemma in drug discovery with enhanced federated learning approaches.面对药物发现中因小而偏的数据困境,采用增强型联邦学习方法。
Sci China Life Sci. 2022 Mar;65(3):529-539. doi: 10.1007/s11427-021-1946-0. Epub 2021 Jul 26.
4
COVID-19 detection using federated machine learning.使用联邦机器学习进行 COVID-19 检测。
PLoS One. 2021 Jun 8;16(6):e0252573. doi: 10.1371/journal.pone.0252573. eCollection 2021.
5
Learning From Others Without Sacrificing Privacy: Simulation Comparing Centralized and Federated Machine Learning on Mobile Health Data.从他人身上学习而不牺牲隐私:移动健康数据集中式和联邦机器学习的模拟比较。
JMIR Mhealth Uhealth. 2021 Mar 30;9(3):e23728. doi: 10.2196/23728.
6
Architectural Design of a Blockchain-Enabled, Federated Learning Platform for Algorithmic Fairness in Predictive Health Care: Design Science Study.区块链赋能的联邦学习平台的架构设计用于预测性医疗保健中的算法公平性:设计科学研究。
J Med Internet Res. 2023 Oct 30;25:e46547. doi: 10.2196/46547.
7
Boosted federated learning based on improved Particle Swarm Optimization for healthcare IoT devices.基于改进粒子群优化算法的联邦学习在医疗保健物联网设备中的应用。
Comput Biol Med. 2023 Sep;163:107195. doi: 10.1016/j.compbiomed.2023.107195. Epub 2023 Jun 22.
8
Federated Graph Anomaly Detection via Contrastive Self-Supervised Learning.通过对比自监督学习的联邦图异常检测
IEEE Trans Neural Netw Learn Syst. 2025 May;36(5):7931-7944. doi: 10.1109/TNNLS.2024.3414326. Epub 2025 May 6.
9
Accessible Ecosystem for Clinical Research (Federated Learning for Everyone): Development and Usability Study.临床研究的可访问生态系统(面向大众的联邦学习):开发与可用性研究
JMIR Form Res. 2024 Jul 17;8:e55496. doi: 10.2196/55496.
10
FedMed: A Federated Learning Framework for Language Modeling.FedMed:一种用于语言模型的联邦学习框架。
Sensors (Basel). 2020 Jul 21;20(14):4048. doi: 10.3390/s20144048.

引用本文的文献

1
Advancing genome-based precision medicine: a review on machine learning applications for rare genetic disorders.推进基于基因组的精准医学:关于机器学习在罕见遗传疾病中的应用综述
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf329.
2
Genetic association meta-analysis is susceptible to confounding by between-study cryptic relatedness.基因关联荟萃分析容易受到研究间隐秘相关性造成的混杂影响。
bioRxiv. 2025 May 12:2025.05.10.653279. doi: 10.1101/2025.05.10.653279.
3
The Heart of Transformation: Exploring Artificial Intelligence in Cardiovascular Disease.

本文引用的文献

1
The Data and Research Center: Creating a Secure, Scalable, and Sustainable Ecosystem for Biomedical Research.数据与研究中心:为生物医学研究创建安全、可扩展和可持续的生态系统。
Annu Rev Biomed Data Sci. 2023 Aug 10;6:443-464. doi: 10.1146/annurev-biodatasci-122120-104825.
2
Disclosure control of machine learning models from trusted research environments (TRE): New challenges and opportunities.来自可信研究环境(TRE)的机器学习模型的披露控制:新挑战与机遇。
Heliyon. 2023 Apr 3;9(4):e15143. doi: 10.1016/j.heliyon.2023.e15143. eCollection 2023 Apr.
3
Democratizing clinical-genomic data: How federated platforms can promote benefits sharing in genomics.
变革的核心:探索心血管疾病中的人工智能
Biomedicines. 2025 Feb 10;13(2):427. doi: 10.3390/biomedicines13020427.
4
Federated Learning: Breaking Down Barriers in Global Genomic Research.联邦学习:打破全球基因组研究中的障碍。
Genes (Basel). 2024 Dec 22;15(12):1650. doi: 10.3390/genes15121650.
5
Prequalification of genome-based newborn screening for severe childhood genetic diseases through federated training based on purifying hyperselection.通过基于纯化超选择的联合训练对严重儿童遗传病进行基于基因组的新生儿筛查预认证。
Am J Hum Genet. 2024 Dec 5;111(12):2618-2642. doi: 10.1016/j.ajhg.2024.10.021.
6
Future-proofing genomic data and consent management: a comprehensive review of technology innovations.未来基因组数据和知情同意管理:技术创新的综合评述。
Gigascience. 2024 Jan 2;13. doi: 10.1093/gigascience/giae021.
临床基因组数据的民主化:联合平台如何促进基因组学中的利益共享。
Front Genet. 2023 Jan 10;13:1045450. doi: 10.3389/fgene.2022.1045450. eCollection 2022.
4
Federated horizontally partitioned principal component analysis for biomedical applications.用于生物医学应用的联邦水平分区主成分分析
Bioinform Adv. 2022 Apr 26;2(1):vbac026. doi: 10.1093/bioadv/vbac026. eCollection 2022.
5
Next-Generation Capabilities in Trusted Research Environments: Interview Study.可信研究环境中的下一代能力:访谈研究。
J Med Internet Res. 2022 Sep 20;24(9):e33720. doi: 10.2196/33720.
6
sPLINK: a hybrid federated tool as a robust alternative to meta-analysis in genome-wide association studies.sPLINK:一种混合联邦工具,是全基因组关联研究中替代荟萃分析的强大选择。
Genome Biol. 2022 Jan 24;23(1):32. doi: 10.1186/s13059-021-02562-1.
7
Portability of 245 polygenic scores when derived from the UK Biobank and applied to 9 ancestry groups from the same cohort.245 个多基因评分在英国生物样本库中得出并应用于来自同一队列的 9 个祖先群体时的可转移性。
Am J Hum Genet. 2022 Jan 6;109(1):12-23. doi: 10.1016/j.ajhg.2021.11.008.
8
Genetic discrimination: emerging ethical challenges in the context of advancing technology.基因歧视:技术进步背景下新出现的伦理挑战。
J Law Biosci. 2019 Dec 5;7(1):lsz016. doi: 10.1093/jlb/lsz016. eCollection 2020 Jan-Dec.
9
The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation.多基因风险评分目录作为一个开放的数据库,用于可重复性和系统评估。
Nat Genet. 2021 Apr;53(4):420-425. doi: 10.1038/s41588-021-00783-5.
10
Negative selection on complex traits limits phenotype prediction accuracy between populations.复杂性状的负选择限制了不同人群之间的表型预测准确性。
Am J Hum Genet. 2021 Apr 1;108(4):620-631. doi: 10.1016/j.ajhg.2021.02.013. Epub 2021 Mar 9.