精准医学中针对代表性不足人群：一种联邦迁移学习方法。

TARGETING UNDERREPRESENTED POPULATIONS IN PRECISION MEDICINE: A FEDERATED TRANSFER LEARNING APPROACH.

作者信息

Li By Sai, Cai Tianxi, Duan Rui

机构信息

Institute of Statistics and Big Data, Renmin University of China.

Department of Biostatistics, Harvard T.H. Chan School of Public Health.

出版信息

Ann Appl Stat. 2023 Dec;17(4):2970-2992. doi: 10.1214/23-AOAS1747. Epub 2023 Oct 30.

DOI:10.1214/23-AOAS1747

PMID:39314265

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11417462/

Abstract

The limited representation of minorities and disadvantaged populations in large-scale clinical and genomics research poses a significant barrier to translating precision medicine research into practice. Prediction models are likely to underperform in underrepresented populations due to heterogeneity across populations, thereby exacerbating known health disparities. To address this issue, we propose FETA, a two-way data integration method that leverages a federated transfer learning approach to integrate heterogeneous data from diverse populations and multiple healthcare institutions, with a focus on a target population of interest having limited sample sizes. We show that FETA achieves performance comparable to the pooled analysis, where individual-level data is shared across institutions, with only a small number of communications across participating sites. Our theoretical analysis and simulation study demonstrate how FETA's estimation accuracy is influenced by communication budgets, privacy restrictions, and heterogeneity across populations. We apply FETA to multisite data from the electronic Medical Records and Genomics (eMERGE) Network to construct genetic risk prediction models for extreme obesity. Compared to models trained using target data only, source data only, and all data without accounting for population-level differences, FETA shows superior predictive performance. FETA has the potential to improve estimation and prediction accuracy in underrepresented populations and reduce the gap in model performance across populations.

摘要

少数群体和弱势群体在大规模临床和基因组学研究中的代表性有限，这对将精准医学研究转化为实际应用构成了重大障碍。由于不同人群之间的异质性，预测模型在代表性不足的人群中可能表现不佳，从而加剧了已知的健康差距。为了解决这个问题，我们提出了FETA，这是一种双向数据整合方法，它利用联邦迁移学习方法来整合来自不同人群和多个医疗机构的异构数据，重点关注样本量有限的目标感兴趣人群。我们表明，FETA实现了与汇总分析相当的性能，在汇总分析中，个体层面的数据在各机构之间共享，而参与站点之间只需进行少量通信。我们的理论分析和模拟研究证明了FETA的估计准确性是如何受到通信预算、隐私限制和人群异质性影响的。我们将FETA应用于电子病历与基因组学（eMERGE）网络的多站点数据，以构建极端肥胖的遗传风险预测模型。与仅使用目标数据、仅使用源数据以及不考虑人群水平差异的所有数据训练的模型相比，FETA显示出卓越的预测性能。FETA有潜力提高代表性不足人群的估计和预测准确性，并缩小不同人群之间的模型性能差距。

相似文献

TARGETING UNDERREPRESENTED POPULATIONS IN PRECISION MEDICINE: A FEDERATED TRANSFER LEARNING APPROACH.

Ann Appl Stat. 2023 Dec;17(4):2970-2992. doi: 10.1214/23-AOAS1747. Epub 2023 Oct 30.

COMMUTE: Communication-efficient transfer learning for multi-site risk prediction.

J Biomed Inform. 2023 Jan;137:104243. doi: 10.1016/j.jbi.2022.104243. Epub 2022 Nov 18.

Empowering Precision Medicine: Unlocking Revolutionary Insights through Blockchain-Enabled Federated Learning and Electronic Medical Records.

Sensors (Basel). 2023 Aug 28;23(17):7476. doi: 10.3390/s23177476.

Tackling heterogeneity in medical federated learning via aligning vision transformers.

Artif Intell Med. 2024 Sep;155:102936. doi: 10.1016/j.artmed.2024.102936. Epub 2024 Jul 25.

Combining Federated Machine Learning and Qualitative Methods to Investigate Novel Pediatric Asthma Subtypes: Protocol for a Mixed Methods Study.

JMIR Res Protoc. 2024 Jul 8;13:e57981. doi: 10.2196/57981.

AFEI: adaptive optimized vertical federated learning for heterogeneous multi-omics data integration.

Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad269.

SplitAVG: A Heterogeneity-Aware Federated Deep Learning Method for Medical Imaging.

IEEE J Biomed Health Inform. 2022 Sep;26(9):4635-4644. doi: 10.1109/JBHI.2022.3185956. Epub 2022 Sep 9.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Data heterogeneity in federated learning with Electronic Health Records: Case studies of risk prediction for acute kidney injury and sepsis diseases in critical care.

PLOS Digit Health. 2023 Mar 15;2(3):e0000117. doi: 10.1371/journal.pdig.0000117. eCollection 2023 Mar.

Analyzing the Impact of Personalization on Fairness in Federated Learning for Healthcare.

J Healthc Inform Res. 2024 Mar 23;8(2):181-205. doi: 10.1007/s41666-024-00164-7. eCollection 2024 Jun.

引用本文的文献

Enhancing Genetic Risk Prediction through Federated Semi-Supervised Transfer Learning with Inaccurate Electronic Health Record Data.

Stat Biosci. 2024 Aug 13. doi: 10.1007/s12561-024-09449-2.

Bridging Data Gaps in Healthcare: A Scoping Review of Transfer Learning in Structured Data Analysis.

Health Data Sci. 2025 Sep 3;5:0321. doi: 10.34133/hds.0321. eCollection 2025.

Robust angle-based transfer learning in high dimensions.

J R Stat Soc Series B Stat Methodol. 2024 Dec 3;87(3):723-745. doi: 10.1093/jrsssb/qkae111. eCollection 2025 Jul.

Uncovering Heterogeneous Effects via Localized Feature Selection.

bioRxiv. 2025 Jun 7:2025.06.03.657761. doi: 10.1101/2025.06.03.657761.

Semi-supervised Triply Robust Inductive Transfer Learning.

J Am Stat Assoc. 2025;120:1037-1047. doi: 10.1080/01621459.2024.2393463. Epub 2024 Oct 10.

Transfer learning for mortality risk: A case study on the United Kingdom.

PLoS One. 2025 May 23;20(5):e0313378. doi: 10.1371/journal.pone.0313378. eCollection 2025.

Transdiagnostic Polygenic Risk Models for Psychopathology and Comorbidity: Cross-Ancestry Analysis in the Research Program.

medRxiv. 2025 Mar 28:2025.03.26.25324720. doi: 10.1101/2025.03.26.25324720.

Polygenic prediction for underrepresented populations through transfer learning by utilizing genetic similarity shared with European populations.

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf048.

A robust transfer learning approach for high-dimensional linear regression to support integration of multi-source gene expression data.

PLoS Comput Biol. 2025 Jan 10;21(1):e1012739. doi: 10.1371/journal.pcbi.1012739. eCollection 2025 Jan.

Multi-Task Learning with Summary Statistics.

Adv Neural Inf Process Syst. 2023;36:54020-54031. Epub 2024 May 30.

本文引用的文献

Transfer Learning under High-dimensional Generalized Linear Models.

J Am Stat Assoc. 2023;118(544):2684-2697. doi: 10.1080/01621459.2022.2071278. Epub 2022 Jun 27.

Transfer Learning in Large-scale Gaussian Graphical Models with False Discovery Rate Control.

J Am Stat Assoc. 2023;118(543):2171-2183. doi: 10.1080/01621459.2022.2044333. Epub 2022 Mar 18.

Individual Data Protected Integrative Regression Analysis of High-Dimensional Heterogeneous Data.

J Am Stat Assoc. 2022;117(540):2105-2119. doi: 10.1080/01621459.2021.1904958. Epub 2021 May 19.

Integrative High Dimensional Multiple Testing with Heterogeneity under Data Sharing Constraints.

J Mach Learn Res. 2021 Apr;22.

Global Biobank Meta-analysis Initiative: Powering genetic discovery across human disease.

Cell Genom. 2022 Oct 12;2(10):100192. doi: 10.1016/j.xgen.2022.100192.

Association of childhood BMI trajectory with post-adolescent and adult lung function is mediated by pre-adolescent DNA methylation.

Respir Res. 2022 Jul 29;23(1):194. doi: 10.1186/s12931-022-02089-4.

Transfer Learning for High-Dimensional Linear Regression: Prediction, Estimation and Minimax Optimality.

J R Stat Soc Series B Stat Methodol. 2022 Feb;84(1):149-173. doi: 10.1111/rssb.12479. Epub 2021 Nov 16.

An artificial intelligence framework integrating longitudinal electronic health records with real-world data enables continuous pan-cancer prognostication.

Nat Cancer. 2021 Jul;2(7):709-722. doi: 10.1038/s43018-021-00236-2. Epub 2021 Jul 22.

The genetics of obesity: from discovery to biology.

Nat Rev Genet. 2022 Feb;23(2):120-133. doi: 10.1038/s41576-021-00414-z. Epub 2021 Sep 23.

A unified framework for cross-population trait prediction by leveraging the genetic correlation of polygenic traits.

Am J Hum Genet. 2021 Apr 1;108(4):632-655. doi: 10.1016/j.ajhg.2021.03.002. Epub 2021 Mar 25.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

精准医学中针对代表性不足人群：一种联邦迁移学习方法。

TARGETING UNDERREPRESENTED POPULATIONS IN PRECISION MEDICINE: A FEDERATED TRANSFER LEARNING APPROACH.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献