• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

高维线性回归的迁移学习:预测、估计与极小极大最优性

Transfer Learning for High-Dimensional Linear Regression: Prediction, Estimation and Minimax Optimality.

作者信息

Li Sai, Cai T Tony, Li Hongzhe

机构信息

Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennvania, Philadelphia, PA 19104.

Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA 19104.

出版信息

J R Stat Soc Series B Stat Methodol. 2022 Feb;84(1):149-173. doi: 10.1111/rssb.12479. Epub 2021 Nov 16.

DOI:10.1111/rssb.12479
PMID:35210933
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8863181/
Abstract

This paper considers estimation and prediction of a high-dimensional linear regression in the setting of transfer learning where, in addition to observations from the target model, auxiliary samples from different but possibly related regression models are available. When the set of informative auxiliary studies is known, an estimator and a predictor are proposed and their optimality is established. The optimal rates of convergence for prediction and estimation are faster than the corresponding rates without using the auxiliary samples. This implies that knowledge from the informative auxiliary samples can be transferred to improve the learning performance of the target problem. When the set of informative auxiliary samples is unknown, we propose a data-driven procedure for transfer learning, called Trans-Lasso, and show its robustness to non-informative auxiliary samples and its efficiency in knowledge transfer. The proposed procedures are demonstrated in numerical studies and are applied to a dataset concerning the associations among gene expressions. It is shown that Trans-Lasso leads to improved performance in gene expression prediction in a target tissue by incorporating data from multiple different tissues as auxiliary samples.

摘要

本文考虑在迁移学习环境下高维线性回归的估计和预测问题,其中除了来自目标模型的观测值外,还可获得来自不同但可能相关回归模型的辅助样本。当已知信息丰富的辅助研究集时,提出了一种估计器和一个预测器,并确立了它们的最优性。预测和估计的最优收敛速度比不使用辅助样本时的相应速度更快。这意味着来自信息丰富的辅助样本的知识可以被转移,以提高目标问题的学习性能。当信息丰富的辅助样本集未知时,我们提出一种用于迁移学习的数据驱动方法,称为Trans-Lasso,并展示了它对非信息辅助样本的稳健性及其在知识转移方面的效率。所提出的方法在数值研究中得到了验证,并应用于一个关于基因表达关联的数据集。结果表明,通过将来自多个不同组织的数据作为辅助样本纳入,Trans-Lasso在目标组织的基因表达预测中提高了性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/405a/8863181/0ce6b1e39fe5/nihms-1755759-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/405a/8863181/2d4367a7af8f/nihms-1755759-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/405a/8863181/2d3b54277ef6/nihms-1755759-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/405a/8863181/cecdf1673e97/nihms-1755759-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/405a/8863181/1769380428a3/nihms-1755759-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/405a/8863181/0ce6b1e39fe5/nihms-1755759-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/405a/8863181/2d4367a7af8f/nihms-1755759-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/405a/8863181/2d3b54277ef6/nihms-1755759-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/405a/8863181/cecdf1673e97/nihms-1755759-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/405a/8863181/1769380428a3/nihms-1755759-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/405a/8863181/0ce6b1e39fe5/nihms-1755759-f0005.jpg

相似文献

1
Transfer Learning for High-Dimensional Linear Regression: Prediction, Estimation and Minimax Optimality.高维线性回归的迁移学习:预测、估计与极小极大最优性
J R Stat Soc Series B Stat Methodol. 2022 Feb;84(1):149-173. doi: 10.1111/rssb.12479. Epub 2021 Nov 16.
2
Transfer Learning in Large-scale Gaussian Graphical Models with False Discovery Rate Control.具有错误发现率控制的大规模高斯图形模型中的迁移学习
J Am Stat Assoc. 2023;118(543):2171-2183. doi: 10.1080/01621459.2022.2044333. Epub 2022 Mar 18.
3
Transfer learning in high-dimensional semiparametric graphical models with application to brain connectivity analysis.基于高维半参数图模型的迁移学习及其在脑连接分析中的应用。
Stat Med. 2022 Sep 20;41(21):4112-4129. doi: 10.1002/sim.9499. Epub 2022 Jun 21.
4
Estimation and Inference for High-Dimensional Generalized Linear Models with Knowledge Transfer.具有知识转移的高维广义线性模型的估计与推断
J Am Stat Assoc. 2024;119(546):1274-1285. doi: 10.1080/01621459.2023.2184373. Epub 2023 Apr 12.
5
Minimax Estimation of Functionals of Discrete Distributions.离散分布泛函的极小极大估计
IEEE Trans Inf Theory. 2015 May;61(5):2835-2885. doi: 10.1109/tit.2015.2412945. Epub 2015 Mar 13.
6
Transfer Learning under High-dimensional Generalized Linear Models.高维广义线性模型下的迁移学习
J Am Stat Assoc. 2023;118(544):2684-2697. doi: 10.1080/01621459.2022.2071278. Epub 2022 Jun 27.
7
CANONICAL THRESHOLDING FOR NON-SPARSE HIGH-DIMENSIONAL LINEAR REGRESSION.非稀疏高维线性回归的标准阈值法
Ann Stat. 2022 Feb;50(1):460-486. doi: 10.1214/21-aos2116. Epub 2022 Feb 16.
8
Multi-auxiliary domain transfer learning for diagnosis of MCI conversion.多辅助域迁移学习在 MCI 转化诊断中的应用。
Neurol Sci. 2022 Mar;43(3):1721-1739. doi: 10.1007/s10072-021-05568-6. Epub 2021 Sep 12.
9
On the minimax optimality and superiority of deep neural network learning over sparse parameter spaces.关于深度学习神经网络在稀疏参数空间中的极大极小最优性和优越性。
Neural Netw. 2020 Mar;123:343-361. doi: 10.1016/j.neunet.2019.12.014. Epub 2019 Dec 23.
10
Fitting additive risk models using auxiliary information.使用辅助信息拟合相加风险模型。
Stat Med. 2023 Jan 4. doi: 10.1002/sim.9649.

引用本文的文献

1
Improving causal effect estimation in multi-ancestry multivariable Mendelian randomization with transfer learning.利用迁移学习改进多血统多变量孟德尔随机化中的因果效应估计。
bioRxiv. 2025 Aug 8:2025.07.11.664423. doi: 10.1101/2025.07.11.664423.
2
Robust Transfer Learning for High-Dimensional GLM Using -Divergence With Applications to Cancer Genomics.使用散度的高维广义线性模型的稳健迁移学习及其在癌症基因组学中的应用
Stat Med. 2025 Jul;44(15-17):e70170. doi: 10.1002/sim.70170.
3
Transfer Learning for Error-Contaminated Poisson Regression Models.

本文引用的文献

1
Bi-allelic JAM2 Variants Lead to Early-Onset Recessive Primary Familial Brain Calcification.双等位基因 JAM2 变异导致早发性常染色体隐性遗传性原发性家族性脑钙化。
Am J Hum Genet. 2020 Mar 5;106(3):412-421. doi: 10.1016/j.ajhg.2020.02.007.
2
Biallelic loss-of-function mutations in JAM2 cause primary familial brain calcification.JAM2 中的双等位基因功能丧失突变导致原发性家族性脑钙化。
Brain. 2020 Feb 1;143(2):491-502. doi: 10.1093/brain/awz392.
3
Genomic Relationships, Novel Loci, and Pleiotropic Mechanisms across Eight Psychiatric Disorders.
误差污染泊松回归模型的迁移学习
Stat Med. 2025 Jul;44(15-17):e70163. doi: 10.1002/sim.70163.
4
Robust angle-based transfer learning in high dimensions.高维空间中基于稳健角度的迁移学习
J R Stat Soc Series B Stat Methodol. 2024 Dec 3;87(3):723-745. doi: 10.1093/jrsssb/qkae111. eCollection 2025 Jul.
5
Semi-supervised Triply Robust Inductive Transfer Learning.半监督三重稳健归纳迁移学习
J Am Stat Assoc. 2025;120:1037-1047. doi: 10.1080/01621459.2024.2393463. Epub 2024 Oct 10.
6
Transfer learning for mortality risk: A case study on the United Kingdom.用于死亡率风险的迁移学习:以英国为例的案例研究。
PLoS One. 2025 May 23;20(5):e0313378. doi: 10.1371/journal.pone.0313378. eCollection 2025.
7
Doubly Robust Augmented Model Accuracy Transfer Inference with High Dimensional Features.具有高维特征的双稳健增强模型精度转移推断
J Am Stat Assoc. 2025;120(549):524-534. doi: 10.1080/01621459.2024.2356291. Epub 2024 Jun 24.
8
Transfer learning for accelerated failure time model with microarray data.基于微阵列数据的加速失效时间模型的迁移学习
BMC Bioinformatics. 2025 Mar 17;26(1):84. doi: 10.1186/s12859-025-06056-w.
9
Optimal and Safe Estimation for High-Dimensional Semi-Supervised Learning.高维半监督学习的最优与安全估计
J Am Stat Assoc. 2024;119(548):2748-2759. doi: 10.1080/01621459.2023.2277409. Epub 2024 Jan 4.
10
Transfer Learning Prediction of Early Exposures and Genetic Risk Score on Adult Obesity in Two Minority Cohorts.两个少数民族队列中成人肥胖早期暴露和遗传风险评分的迁移学习预测
Prev Sci. 2025 Feb;26(2):234-245. doi: 10.1007/s11121-025-01781-3. Epub 2025 Feb 6.
精神障碍的八大类疾病中的基因组关系、新位点和多效机制。
Cell. 2019 Dec 12;179(7):1469-1482.e11. doi: 10.1016/j.cell.2019.11.020.
4
Horizontal and vertical integrative analysis methods for mental disorders omics data.精神障碍组学数据的水平和垂直整合分析方法。
Sci Rep. 2019 Sep 17;9(1):13430. doi: 10.1038/s41598-019-49718-5.
5
A statistical framework for cross-tissue transcriptome-wide association analysis.跨组织转录组全基因组关联分析的统计框架。
Nat Genet. 2019 Mar;51(3):568-576. doi: 10.1038/s41588-019-0345-7. Epub 2019 Feb 25.
6
The joint lasso: high-dimensional regression for group structured data.联合套索:用于群组结构数据的高维回归。
Biostatistics. 2020 Apr 1;21(2):219-235. doi: 10.1093/biostatistics/kxy035.
7
Exploring regulation in tissues with eQTL networks.探索具有 eQTL 网络的组织中的调控。
Proc Natl Acad Sci U S A. 2017 Sep 12;114(37):E7841-E7850. doi: 10.1073/pnas.1707375114. Epub 2017 Aug 29.
8
Polygenic scores via penalized regression on summary statistics.基于汇总统计量的惩罚回归多基因评分。
Genet Epidemiol. 2017 Sep;41(6):469-480. doi: 10.1002/gepi.22050. Epub 2017 May 8.
9
Integrative Analysis of Multi-omics Data for Discovery and Functional Studies of Complex Human Diseases.用于复杂人类疾病发现和功能研究的多组学数据综合分析
Adv Genet. 2016;93:147-90. doi: 10.1016/bs.adgen.2015.11.004. Epub 2016 Jan 25.
10
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning.用于计算机辅助检测的深度卷积神经网络:卷积神经网络架构、数据集特征与迁移学习
IEEE Trans Med Imaging. 2016 May;35(5):1285-98. doi: 10.1109/TMI.2016.2528162. Epub 2016 Feb 11.