Suppr超能文献

高维线性回归的迁移学习:预测、估计与极小极大最优性

Transfer Learning for High-Dimensional Linear Regression: Prediction, Estimation and Minimax Optimality.

作者信息

Li Sai, Cai T Tony, Li Hongzhe

机构信息

Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennvania, Philadelphia, PA 19104.

Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA 19104.

出版信息

J R Stat Soc Series B Stat Methodol. 2022 Feb;84(1):149-173. doi: 10.1111/rssb.12479. Epub 2021 Nov 16.

Abstract

This paper considers estimation and prediction of a high-dimensional linear regression in the setting of transfer learning where, in addition to observations from the target model, auxiliary samples from different but possibly related regression models are available. When the set of informative auxiliary studies is known, an estimator and a predictor are proposed and their optimality is established. The optimal rates of convergence for prediction and estimation are faster than the corresponding rates without using the auxiliary samples. This implies that knowledge from the informative auxiliary samples can be transferred to improve the learning performance of the target problem. When the set of informative auxiliary samples is unknown, we propose a data-driven procedure for transfer learning, called Trans-Lasso, and show its robustness to non-informative auxiliary samples and its efficiency in knowledge transfer. The proposed procedures are demonstrated in numerical studies and are applied to a dataset concerning the associations among gene expressions. It is shown that Trans-Lasso leads to improved performance in gene expression prediction in a target tissue by incorporating data from multiple different tissues as auxiliary samples.

摘要

本文考虑在迁移学习环境下高维线性回归的估计和预测问题,其中除了来自目标模型的观测值外,还可获得来自不同但可能相关回归模型的辅助样本。当已知信息丰富的辅助研究集时,提出了一种估计器和一个预测器,并确立了它们的最优性。预测和估计的最优收敛速度比不使用辅助样本时的相应速度更快。这意味着来自信息丰富的辅助样本的知识可以被转移,以提高目标问题的学习性能。当信息丰富的辅助样本集未知时,我们提出一种用于迁移学习的数据驱动方法,称为Trans-Lasso,并展示了它对非信息辅助样本的稳健性及其在知识转移方面的效率。所提出的方法在数值研究中得到了验证,并应用于一个关于基因表达关联的数据集。结果表明,通过将来自多个不同组织的数据作为辅助样本纳入,Trans-Lasso在目标组织的基因表达预测中提高了性能。

相似文献

5
Minimax Estimation of Functionals of Discrete Distributions.离散分布泛函的极小极大估计
IEEE Trans Inf Theory. 2015 May;61(5):2835-2885. doi: 10.1109/tit.2015.2412945. Epub 2015 Mar 13.
6
Transfer Learning under High-dimensional Generalized Linear Models.高维广义线性模型下的迁移学习
J Am Stat Assoc. 2023;118(544):2684-2697. doi: 10.1080/01621459.2022.2071278. Epub 2022 Jun 27.
8
Multi-auxiliary domain transfer learning for diagnosis of MCI conversion.多辅助域迁移学习在 MCI 转化诊断中的应用。
Neurol Sci. 2022 Mar;43(3):1721-1739. doi: 10.1007/s10072-021-05568-6. Epub 2021 Sep 12.

引用本文的文献

4
Robust angle-based transfer learning in high dimensions.高维空间中基于稳健角度的迁移学习
J R Stat Soc Series B Stat Methodol. 2024 Dec 3;87(3):723-745. doi: 10.1093/jrsssb/qkae111. eCollection 2025 Jul.
5
Semi-supervised Triply Robust Inductive Transfer Learning.半监督三重稳健归纳迁移学习
J Am Stat Assoc. 2025;120:1037-1047. doi: 10.1080/01621459.2024.2393463. Epub 2024 Oct 10.
9
Optimal and Safe Estimation for High-Dimensional Semi-Supervised Learning.高维半监督学习的最优与安全估计
J Am Stat Assoc. 2024;119(548):2748-2759. doi: 10.1080/01621459.2023.2277409. Epub 2024 Jan 4.

本文引用的文献

7
Exploring regulation in tissues with eQTL networks.探索具有 eQTL 网络的组织中的调控。
Proc Natl Acad Sci U S A. 2017 Sep 12;114(37):E7841-E7850. doi: 10.1073/pnas.1707375114. Epub 2017 Aug 29.
8
Polygenic scores via penalized regression on summary statistics.基于汇总统计量的惩罚回归多基因评分。
Genet Epidemiol. 2017 Sep;41(6):469-480. doi: 10.1002/gepi.22050. Epub 2017 May 8.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验