Suppr超能文献

通过联合半监督迁移学习利用不准确的电子健康记录数据增强遗传风险预测

Enhancing Genetic Risk Prediction through Federated Semi-Supervised Transfer Learning with Inaccurate Electronic Health Record Data.

作者信息

Lu Yuying, Gu Tian, Duan Rui

机构信息

Department of Biostatistics, Columbia Mailman School of Public Health, New York, NY 10032, USA.

Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA.

出版信息

Stat Biosci. 2024 Aug 13. doi: 10.1007/s12561-024-09449-2.

Abstract

Large-scale genomics data combined with Electronic Health Records (EHRs) illuminate the path towards personalized disease management and enhanced medical interventions. However, the absence of "gold standard" disease labels makes the development of machine learning models a challenging task. Additionally, imbalances in demographic representation within datasets compromise the development of unbiased healthcare solutions. In response to these challenges, we introduce FEderated Semi-Supervised Transfer Learning (FEST) for improving disease risk predictions in underrepresented populations. FEST facilitates the collaborative training of models across various institutions by leveraging both labeled and unlabeled data from diverse subpopulations. It addresses distributional variations across different populations and healthcare institutions by combining density ratio reweighting and model calibration techniques. Federated learning algorithms are developed for training models using only summary-level statistics. We perform simulation studies to assess the efficacy of FEST in comparisons with a few alternative methods. Subsequently, we apply FEST to training a genetic risk prediction model for type 2 diabetes that targets the African-Ancestry population using data from the Massachusetts General Brigham (MGB) Biobank. Both our computational experiments and real-world data application underline the superior performance of FEST over competing methods.

摘要

大规模基因组学数据与电子健康记录(EHRs)相结合,为个性化疾病管理和强化医疗干预指明了道路。然而,缺乏“金标准”疾病标签使得机器学习模型的开发成为一项具有挑战性的任务。此外,数据集中人口统计学代表性的不平衡损害了无偏医疗保健解决方案的开发。为应对这些挑战,我们引入了联邦半监督迁移学习(FEST),以改善代表性不足人群的疾病风险预测。FEST通过利用来自不同亚人群的标记和未标记数据,促进跨机构的模型协作训练。它通过结合密度比重新加权和模型校准技术,解决了不同人群和医疗机构之间的分布差异。开发了联邦学习算法,用于仅使用汇总级统计数据训练模型。我们进行模拟研究,以评估FEST与一些替代方法相比的有效性。随后,我们应用FEST使用来自马萨诸塞州综合布莱根(MGB)生物银行的数据,为以非洲裔人群为目标的2型糖尿病训练遗传风险预测模型。我们的计算实验和实际数据应用都强调了FEST相对于竞争方法的卓越性能。

相似文献

5
Personalized federated learning with hierarchical reweighting for multi-center clinical prediction.
Comput Methods Programs Biomed. 2025 Nov;271:109015. doi: 10.1016/j.cmpb.2025.109015. Epub 2025 Aug 22.

本文引用的文献

2
Robust angle-based transfer learning in high dimensions.高维空间中基于稳健角度的迁移学习
J R Stat Soc Series B Stat Methodol. 2024 Dec 3;87(3):723-745. doi: 10.1093/jrsssb/qkae111. eCollection 2025 Jul.
3
Semi-supervised Triply Robust Inductive Transfer Learning.半监督三重稳健归纳迁移学习
J Am Stat Assoc. 2025;120:1037-1047. doi: 10.1080/01621459.2024.2393463. Epub 2024 Oct 10.
6
Federated causal inference in heterogeneous observational data.基于异质观测数据的联邦因果推断。
Stat Med. 2023 Oct 30;42(24):4418-4439. doi: 10.1002/sim.9868. Epub 2023 Aug 8.
9
Tutorial: a guide to performing polygenic risk score analyses.教程:多基因风险评分分析操作指南。
Nat Protoc. 2020 Sep;15(9):2759-2772. doi: 10.1038/s41596-020-0353-1. Epub 2020 Jul 24.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验