Suppr超能文献

通勤:面向多站点风险预测的通信高效迁移学习。

COMMUTE: Communication-efficient transfer learning for multi-site risk prediction.

机构信息

Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, United States.

Department of Psychiatry, Harvard Medical School, Boston, MA, United States; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, United States; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, United States.

出版信息

J Biomed Inform. 2023 Jan;137:104243. doi: 10.1016/j.jbi.2022.104243. Epub 2022 Nov 18.

Abstract

OBJECTIVES

We propose a communication-efficient transfer learning approach (COMMUTE) that effectively incorporates multi-site healthcare data for training a risk prediction model in a target population of interest, accounting for challenges including population heterogeneity and data sharing constraints across sites.

METHODS

We first train population-specific source models locally within each site. Using data from a given target population, COMMUTE learns a calibration term for each source model, which adjusts for potential data heterogeneity through flexible distance-based regularizations. In a centralized setting where multi-site data can be directly pooled, all data are combined to train the target model after calibration. When individual-level data are not shareable in some sites, COMMUTE requests only the locally trained models from these sites, with which, COMMUTE generates heterogeneity-adjusted synthetic data for training the target model. We evaluate COMMUTE via extensive simulation studies and an application to multi-site data from the electronic Medical Records and Genomics (eMERGE) Network to predict extreme obesity.

RESULTS

Simulation studies show that COMMUTE outperforms methods without adjusting for population heterogeneity and methods trained in a single population over a broad spectrum of settings. Using eMERGE data, COMMUTE achieves an area under the receiver operating characteristic curve (AUC) around 0.80, which outperforms other benchmark methods with AUC ranging from 0.51 to 0.70.

CONCLUSION

COMMUTE improves the risk prediction in a target population with limited samples and safeguards against negative transfer when some source populations are highly different from the target. In a federated setting, it is highly communication efficient as it only requires each site to share model parameter estimates once, and no iterative communication or higher-order terms are needed.

摘要

目的

我们提出了一种高效的通信转移学习方法(COMMUTE),可以有效地整合多站点医疗保健数据,以在目标人群中训练风险预测模型,同时考虑到包括人群异质性和站点间数据共享限制在内的挑战。

方法

我们首先在每个站点内进行特定人群的本地训练。使用来自给定目标人群的数据,COMMUTE 为每个源模型学习校准项,通过灵活的基于距离的正则化来调整潜在的数据异质性。在可以直接汇总多站点数据的集中设置中,在进行校准后,所有数据都被组合在一起训练目标模型。当某些站点的个体水平数据不可共享时,COMMUTE 仅从这些站点请求本地训练的模型,并使用这些模型生成调整后的异质合成数据来训练目标模型。我们通过广泛的模拟研究和对电子病历和基因组学(eMERGE)网络的多站点数据的应用来评估 COMMUTE,以预测极端肥胖。

结果

模拟研究表明,COMMUTE 在广泛的设置范围内优于不调整人群异质性的方法和在单一人群中训练的方法。使用 eMERGE 数据,COMMUTE 的接收器操作特征曲线下面积(AUC)约为 0.80,优于 AUC 范围在 0.51 到 0.70 之间的其他基准方法。

结论

COMMUTE 可以在样本有限的情况下提高目标人群的风险预测能力,并防止当某些源人群与目标人群高度不同时出现负迁移。在联邦设置中,它的通信效率非常高,因为它只需要每个站点共享一次模型参数估计,而不需要迭代通信或更高阶项。

相似文献

1
COMMUTE: Communication-efficient transfer learning for multi-site risk prediction.
J Biomed Inform. 2023 Jan;137:104243. doi: 10.1016/j.jbi.2022.104243. Epub 2022 Nov 18.
2
TARGETING UNDERREPRESENTED POPULATIONS IN PRECISION MEDICINE: A FEDERATED TRANSFER LEARNING APPROACH.
Ann Appl Stat. 2023 Dec;17(4):2970-2992. doi: 10.1214/23-AOAS1747. Epub 2023 Oct 30.
4
Federated learning of predictive models from federated Electronic Health Records.
Int J Med Inform. 2018 Apr;112:59-67. doi: 10.1016/j.ijmedinf.2018.01.007. Epub 2018 Jan 12.
6
Predicting the risk of emergency admission with machine learning: Development and validation using linked electronic health records.
PLoS Med. 2018 Nov 20;15(11):e1002695. doi: 10.1371/journal.pmed.1002695. eCollection 2018 Nov.

引用本文的文献

1
Bridging Data Gaps in Healthcare: A Scoping Review of Transfer Learning in Structured Data Analysis.
Health Data Sci. 2025 Sep 3;5:0321. doi: 10.34133/hds.0321. eCollection 2025.
2
Robust angle-based transfer learning in high dimensions.
J R Stat Soc Series B Stat Methodol. 2024 Dec 3;87(3):723-745. doi: 10.1093/jrsssb/qkae111. eCollection 2025 Jul.
3
Doubly Robust Augmented Model Accuracy Transfer Inference with High Dimensional Features.
J Am Stat Assoc. 2025;120(549):524-534. doi: 10.1080/01621459.2024.2356291. Epub 2024 Jun 24.
4
Multi-Task Learning with Summary Statistics.
Adv Neural Inf Process Syst. 2023;36:54020-54031. Epub 2024 May 30.
5
Learning across diverse biomedical data modalities and cohorts: Challenges and opportunities for innovation.
Patterns (N Y). 2024 Jan 17;5(2):100913. doi: 10.1016/j.patter.2023.100913. eCollection 2024 Feb 9.
6
A synthetic data integration framework to leverage external summary-level information from heterogeneous populations.
Biometrics. 2023 Dec;79(4):3831-3845. doi: 10.1111/biom.13852. Epub 2023 Apr 4.

本文引用的文献

1
Transfer Learning under High-dimensional Generalized Linear Models.
J Am Stat Assoc. 2023;118(544):2684-2697. doi: 10.1080/01621459.2022.2071278. Epub 2022 Jun 27.
3
Global Biobank Meta-analysis Initiative: Powering genetic discovery across human disease.
Cell Genom. 2022 Oct 12;2(10):100192. doi: 10.1016/j.xgen.2022.100192.
4
SCEHR: Supervised Contrastive Learning for Clinical Risk Prediction using Electronic Health Records.
Proc IEEE Int Conf Data Min. 2021 Dec;2021:857-866. doi: 10.1109/icdm51629.2021.00097.
5
SurvMaximin: Robust federated approach to transporting survival risk prediction models.
J Biomed Inform. 2022 Oct;134:104176. doi: 10.1016/j.jbi.2022.104176. Epub 2022 Aug 23.
7
Transfer Learning for High-Dimensional Linear Regression: Prediction, Estimation and Minimax Optimality.
J R Stat Soc Series B Stat Methodol. 2022 Feb;84(1):149-173. doi: 10.1111/rssb.12479. Epub 2021 Nov 16.
8
Use of the PsycheMERGE Network to Investigate the Association Between Depression Polygenic Scores and White Blood Cell Count.
JAMA Psychiatry. 2021 Dec 1;78(12):1365-1374. doi: 10.1001/jamapsychiatry.2021.2959.
9
Differential privacy in health research: A scoping review.
J Am Med Inform Assoc. 2021 Sep 18;28(10):2269-2276. doi: 10.1093/jamia/ocab135.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验