Suppr超能文献

针对异质子群体的高维因子回归

HIGH-DIMENSIONAL FACTOR REGRESSION FOR HETEROGENEOUS SUBPOPULATIONS.

作者信息

Wang Peiyao, Li Quefeng, Shen Dinggang, Liu Yufeng

机构信息

University of North Carolina at Chapel Hill.

ShanghaiTech University.

出版信息

Stat Sin. 2023 Jan;33(1):27-53. doi: 10.5705/ss.202020.0145.

Abstract

In modern scientific research, data heterogeneity is commonly observed owing to the abundance of complex data. We propose a factor regression model for data with heterogeneous subpopulations. The proposed model can be represented as a decomposition of heterogeneous and homogeneous terms. The heterogeneous term is driven by latent factors in different subpopulations. The homogeneous term captures common variation in the covariates and shares common regression coefficients across subpopulations. Our proposed model attains a good balance between a global model and a group-specific model. The global model ignores the data heterogeneity, while the group-specific model fits each subgroup separately. We prove the estimation and prediction consistency for our proposed estimators, and show that it has better convergence rates than those of the group-specific and global models. We show that the extra cost of estimating latent factors is asymptotically negligible and the minimax rate is still attainable. We further demonstrate the robustness of our proposed method by studying its prediction error under a mis-specified group-specific model. Finally, we conduct simulation studies and analyze a data set from the Alzheimer's Disease Neuroimaging Initiative and an aggregated microarray data set to further demonstrate the competitiveness and interpretability of our proposed factor regression model.

摘要

在现代科学研究中,由于复杂数据丰富,数据异质性普遍存在。我们针对具有异质子群体的数据提出了一种因子回归模型。所提出的模型可表示为异质项和同质项的分解。异质项由不同子群体中的潜在因子驱动。同质项捕捉协变量中的共同变化,并在子群体间共享共同的回归系数。我们提出的模型在全局模型和特定群体模型之间实现了良好的平衡。全局模型忽略数据异质性,而特定群体模型分别拟合每个子组。我们证明了所提出估计量的估计和预测一致性,并表明它比特定群体模型和全局模型具有更好的收敛速度。我们表明,估计潜在因子的额外成本在渐近意义上可忽略不计,并且极小极大率仍然可以达到。我们通过研究在错误指定的特定群体模型下的预测误差,进一步证明了所提出方法的稳健性。最后,我们进行了模拟研究,并分析了来自阿尔茨海默病神经影像学倡议的数据集和一个汇总的微阵列数据集,以进一步证明我们提出的因子回归模型的竞争力和可解释性。

相似文献

1
HIGH-DIMENSIONAL FACTOR REGRESSION FOR HETEROGENEOUS SUBPOPULATIONS.
Stat Sin. 2023 Jan;33(1):27-53. doi: 10.5705/ss.202020.0145.
2
3
Individual Data Protected Integrative Regression Analysis of High-Dimensional Heterogeneous Data.
J Am Stat Assoc. 2022;117(540):2105-2119. doi: 10.1080/01621459.2021.1904958. Epub 2021 May 19.
5
Synthesizing external aggregated information in the penalized Cox regression under population heterogeneity.
Stat Med. 2021 Oct 15;40(23):4915-4930. doi: 10.1002/sim.9101. Epub 2021 Jun 16.
6
Integrated partially linear model for multi-center studies with heterogeneity and batch effect in covariates.
Statistics (Ber). 2023;57(5):987-1009. doi: 10.1080/02331888.2023.2258429. Epub 2023 Oct 13.
8
9
Robust analysis of cancer heterogeneity for high-dimensional data.
Stat Med. 2022 Nov 30;41(27):5448-5462. doi: 10.1002/sim.9578. Epub 2022 Sep 18.
10
Penalized robust learning for optimal treatment regimes with heterogeneous individualized treatment effects.
J Appl Stat. 2023 Feb 20;51(6):1151-1170. doi: 10.1080/02664763.2023.2180167. eCollection 2024.

引用本文的文献

1
Joint and Individual Component Regression.
J Comput Graph Stat. 2024;33(3):763-773. doi: 10.1080/10618600.2023.2284227. Epub 2023 Dec 29.

本文引用的文献

1
Heterogeneity adjustment with applications to graphical model inference.
Electron J Stat. 2018;12(2):3908-3952. doi: 10.1214/18-EJS1466. Epub 2018 Dec 5.
2
Integrative factorization of bidimensionally linked matrices.
Biometrics. 2020 Mar;76(1):61-74. doi: 10.1111/biom.13141. Epub 2019 Nov 10.
3
Structural learning and integrative decomposition of multi-view data.
Biometrics. 2019 Dec;75(4):1121-1132. doi: 10.1111/biom.13108. Epub 2019 Sep 15.
4
Flexible Locally Weighted Penalized Regression With Applications on Prediction of Alzheimer's Disease Neuroimaging Initiative's Clinical Scores.
IEEE Trans Med Imaging. 2019 Jun;38(6):1398-1408. doi: 10.1109/TMI.2018.2884943. Epub 2018 Dec 5.
5
Embracing the Blessing of Dimensionality in Factor Models.
J Am Stat Assoc. 2018;113(521):380-389. doi: 10.1080/01621459.2016.1256815. Epub 2017 Nov 13.
7
A PARTIALLY LINEAR FRAMEWORK FOR MASSIVE HETEROGENEOUS DATA.
Ann Stat. 2016 Aug;44(4):1400-1437. doi: 10.1214/15-AOS1410. Epub 2016 Jul 7.
8
Group Component Analysis for Multiblock Data: Common and Individual Feature Extraction.
IEEE Trans Neural Netw Learn Syst. 2016 Nov;27(11):2426-2439. doi: 10.1109/TNNLS.2015.2487364. Epub 2015 Oct 28.
9
Large Covariance Estimation by Thresholding Principal Orthogonal Complements.
J R Stat Soc Series B Stat Methodol. 2013 Sep 1;75(4). doi: 10.1111/rssb.12016.
10
JOINT AND INDIVIDUAL VARIATION EXPLAINED (JIVE) FOR INTEGRATED ANALYSIS OF MULTIPLE DATA TYPES.
Ann Appl Stat. 2013 Mar 1;7(1):523-542. doi: 10.1214/12-AOAS597.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验