Suppr超能文献

利用非配对临床和基因数据对2型糖尿病进行迁移学习预测。

Transfer learning prediction of type 2 diabetes with unpaired clinical and genetic data.

作者信息

Jung YounSung, Han SeanKyo, Kang EunHee, Park SoYoung, Kim MinHee, Kim NanHee, Ahn TaeJin

机构信息

Department of Life Science, Handong Global University, Pohang, Republic of Korea.

Division of Endocrinology and Metabolism, Department of Internal Medicine, Korea University Ansan Hospital, Ansan, Republic of Korea.

出版信息

Sci Rep. 2025 Jul 29;15(1):27695. doi: 10.1038/s41598-025-05532-w.

Abstract

The prevalence of type 2 diabetes mellitus (T2DM) in Korea has risen in recent years, yet many cases remain undiagnosed. Advanced artificial intelligence models using multi-modal data have shown promise in disease prediction, but two major challenges persist: the scarcity of samples containing all desired data modalities and class imbalance in T2DM datasets. We propose a novel transfer learning framework to predict T2DM onset within five years, using two Korean cohorts (KoGES and SNUH). To utilize unpaired multi-modal data, our approach transfers knowledge between clinical and genetic domains, leveraging unpaired clinical data alongside paired data. We also address class imbalance by applying a positively weighted binary cross-entropy (BCE) loss and a weighted random sampler (WRS). The transfer learning framework improved T2DM prediction performance. Using WRS and weighted BCE loss increased the model's balanced accuracy and AUC (achieving test AUC 0.8441). Furthermore, combining transfer learning with intermediate data fusion yielded even higher performance (test AUC 0.8715). These enhancements were achieved despite limited paired multi-modal samples. Our framework effectively handles scarce paired data and class imbalance, leading to improved T2DM risk prediction. This approach can be adapted to other medical prediction tasks and integrated with additional data modalities, potentially aiding earlier diagnosis and better disease management in clinical settings.

摘要

近年来,韩国2型糖尿病(T2DM)的患病率有所上升,但仍有许多病例未被诊断出来。使用多模态数据的先进人工智能模型在疾病预测方面显示出了前景,但仍存在两个主要挑战:包含所有所需数据模态的样本稀缺以及T2DM数据集中的类别不平衡。我们提出了一种新颖的迁移学习框架,利用两个韩国队列(KoGES和SNUH)来预测五年内T2DM的发病情况。为了利用未配对的多模态数据,我们的方法在临床和基因领域之间转移知识,利用未配对的临床数据以及配对数据。我们还通过应用正加权二元交叉熵(BCE)损失和加权随机采样器(WRS)来解决类别不平衡问题。迁移学习框架提高了T2DM的预测性能。使用WRS和加权BCE损失提高了模型的平衡准确率和AUC(测试AUC达到0.8441)。此外,将迁移学习与中间数据融合相结合产生了更高的性能(测试AUC为0.8715)。尽管配对的多模态样本有限,但仍实现了这些改进。我们的框架有效地处理了稀缺的配对数据和类别不平衡问题,从而改善了T2DM风险预测。这种方法可以适用于其他医学预测任务,并与其他数据模态集成,有可能在临床环境中帮助早期诊断和更好地管理疾病。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验