Suppr超能文献

半监督三重稳健归纳迁移学习

Semi-supervised Triply Robust Inductive Transfer Learning.

作者信息

Cai Tianxi, Li Mengyan, Liu Molei

机构信息

Department of Biostatistics, Harvard T.H. Chan School of Public Health.

Department of Biomedical Informatics, Harvard Medical School.

出版信息

J Am Stat Assoc. 2025;120:1037-1047. doi: 10.1080/01621459.2024.2393463. Epub 2024 Oct 10.

Abstract

In this work, we propose a emi-supervised riply obust nductive transer arning (STRIFLE) approach, which integrates heterogeneous data from a label-rich source population and a label-scarce target population and utilizes a large amount of unlabeled data simultaneously to improve the learning accuracy in the target population. Specifically, we consider a high dimensional covariate shift setting and employ two nuisance models, a density ratio model and an imputation model, to combine transfer learning and surrogate-assisted semi-supervised learning strategies effectively and achieve triple robustness. While the STRIFLE approach assumes the target and source populations to share the same conditional distribution of outcome given both the surrogate features and predictors , it allows the true underlying model of to differ between the two populations due to the potential covariate shift in and . Different from double robustness, even if both nuisance models are misspecified or the distribution of , is not the same between the two populations, the triply robust STRIFLE estimator can still partially use the source population when the shifted source population and the target population share enough similarities. Moreover, it is guaranteed to be no worse than the target-only surrogate-assisted semi-supervised estimator with an additional error term from transferability detection. These desirable properties of our estimator are established theoretically and verified in finite samples via extensive simulation studies. We utilize the STRIFLE estimator to train a Type II diabetes polygenic risk prediction model for the African American target population by transferring knowledge from electronic health records linked genomic data observed in a larger European source population.

摘要

在这项工作中,我们提出了一种半监督稳健归纳迁移学习(STRIFLE)方法,该方法整合了来自标签丰富的源人群和标签稀缺的目标人群的异构数据,并同时利用大量未标记数据来提高目标人群中的学习准确性。具体而言,我们考虑一种高维协变量转移设置,并采用两个干扰模型,即密度比模型和插补模型,以有效地结合迁移学习和替代辅助半监督学习策略,并实现三重稳健性。虽然STRIFLE方法假设目标人群和源人群在给定替代特征和预测变量的情况下共享相同的结果条件分布,但由于和中潜在的协变量转移,它允许两者之间真实的潜在模型有所不同。与双重稳健性不同,即使两个干扰模型都被错误设定,或者和在两个人群之间的分布不同,当转移后的源人群和目标人群有足够的相似性时,三重稳健的STRIFLE估计器仍然可以部分利用源人群。此外,保证它不比仅基于目标人群的替代辅助半监督估计器差,只是会有一个来自可迁移性检测的额外误差项。我们估计器的这些理想特性在理论上得到了确立,并通过广泛的模拟研究在有限样本中得到了验证。我们利用STRIFLE估计器,通过从在更大的欧洲源人群中观察到的电子健康记录链接基因组数据中转移知识,为非裔美国目标人群训练一个II型糖尿病多基因风险预测模型。

相似文献

1
Semi-supervised Triply Robust Inductive Transfer Learning.半监督三重稳健归纳迁移学习
J Am Stat Assoc. 2025;120:1037-1047. doi: 10.1080/01621459.2024.2393463. Epub 2024 Oct 10.
2
Doubly Flexible Estimation under Label Shift.标签转移下的双重灵活估计
J Am Stat Assoc. 2025;120(549):278-290. doi: 10.1080/01621459.2024.2321653. Epub 2024 Mar 19.
3
Electronic cigarettes for smoking cessation.用于戒烟的电子烟。
Cochrane Database Syst Rev. 2025 Jan 29;1(1):CD010216. doi: 10.1002/14651858.CD010216.pub9.
6
Systemic antibiotics for chronic suppurative otitis media.用于慢性化脓性中耳炎的全身性抗生素
Cochrane Database Syst Rev. 2025 Jun 9;6(6):CD013052. doi: 10.1002/14651858.CD013052.pub3.
7
Aural toilet (ear cleaning) for chronic suppurative otitis media.慢性化脓性中耳炎的耳道清理(耳部清洁)
Cochrane Database Syst Rev. 2025 Jun 9;6(6):CD013057. doi: 10.1002/14651858.CD013057.pub3.

本文引用的文献

4
Transfer Learning under High-dimensional Generalized Linear Models.高维广义线性模型下的迁移学习
J Am Stat Assoc. 2023;118(544):2684-2697. doi: 10.1080/01621459.2022.2071278. Epub 2022 Jun 27.
7
Efficient Evaluation of Prediction Rules in Semi-Supervised Settings under Stratified Sampling.分层抽样下半监督设置中预测规则的有效评估
J R Stat Soc Series B Stat Methodol. 2022 Sep;84(4):1353-1391. doi: 10.1111/rssb.12502. Epub 2022 Apr 26.
10
Prevalence of neural collapse during the terminal phase of deep learning training.深度学习训练末期的神经崩溃的普遍性。
Proc Natl Acad Sci U S A. 2020 Oct 6;117(40):24652-24663. doi: 10.1073/pnas.2015509117. Epub 2020 Sep 21.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验