Li Siqi, Li Xin, Yu Kunyu, Wu Qiming, Miao Di, Zhu Mingcheng, Yan Mengying, Ke Yuhe, D'Agostino Danny, Ning Yilin, Wang Ziwen, Shang Yuqing, Liu Molei, Hong Chuan, Liu Nan
Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore.
Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA.
Health Data Sci. 2025 Sep 3;5:0321. doi: 10.34133/hds.0321. eCollection 2025.
Clinical and biomedical research in low-resource settings often faces substantial challenges due to the need for high-quality data with sufficient sample sizes to construct effective models. These constraints hinder robust model training and prompt researchers to seek methods for leveraging existing knowledge from related studies to support new research efforts. Transfer learning (TL), a machine learning technique, emerges as a powerful solution by utilizing knowledge from pretrained models to enhance the performance of new models, offering promise across various healthcare domains. Despite its conceptual origins in the 1990s, the application of TL in medical research has remained limited, especially beyond image analysis. This review aims to analyze TL applications, highlight overlooked techniques, and suggest improvements for future healthcare research. Following the PRISMA-ScR guidelines, we conducted a search for published articles that employed TL with structured clinical or biomedical data by searching the SCOPUS, MEDLINE, Web of Science, Embase, and CINAHL databases. We screened 5,080 papers, with 86 meeting the inclusion criteria. Among these, only 2% (2 of 86) utilized external studies, and 5% (4 of 86) addressed scenarios involving multi-site collaborations with privacy constraints. To achieve actionable TL with structured medical data while addressing regional disparities, inequality, and privacy constraints in healthcare research, we advocate for the careful identification of appropriate source data and models, the selection of suitable TL frameworks, and the validation of TL models with proper baselines.
在资源匮乏的环境中开展临床和生物医学研究往往面临重大挑战,因为需要高质量的数据和足够的样本量来构建有效的模型。这些限制阻碍了稳健的模型训练,并促使研究人员寻求利用相关研究中的现有知识来支持新研究工作的方法。迁移学习(TL)作为一种机器学习技术,通过利用预训练模型的知识来提高新模型的性能,成为一种强大的解决方案,在各个医疗领域都展现出前景。尽管其概念起源于20世纪90年代,但迁移学习在医学研究中的应用仍然有限,尤其是在图像分析之外。本综述旨在分析迁移学习的应用,突出被忽视的技术,并为未来的医疗研究提出改进建议。遵循PRISMA-ScR指南,我们通过搜索SCOPUS、MEDLINE、Web of Science、Embase和CINAHL数据库,查找使用迁移学习处理结构化临床或生物医学数据的已发表文章。我们筛选了5080篇论文,其中86篇符合纳入标准。在这些论文中,只有2%(86篇中的2篇)利用了外部研究,5%(86篇中的4篇)涉及了有隐私限制的多站点合作场景。为了在处理医疗研究中的区域差异、不平等和隐私限制的同时,利用结构化医疗数据实现可操作的迁移学习,我们主张仔细识别合适的源数据和模型,选择合适的迁移学习框架,并使用适当的基线对迁移学习模型进行验证。