Suppr超能文献

针对异质子群体的高维因子回归

HIGH-DIMENSIONAL FACTOR REGRESSION FOR HETEROGENEOUS SUBPOPULATIONS.

作者信息

Wang Peiyao, Li Quefeng, Shen Dinggang, Liu Yufeng

机构信息

University of North Carolina at Chapel Hill.

ShanghaiTech University.

出版信息

Stat Sin. 2023 Jan;33(1):27-53. doi: 10.5705/ss.202020.0145.

Abstract

In modern scientific research, data heterogeneity is commonly observed owing to the abundance of complex data. We propose a factor regression model for data with heterogeneous subpopulations. The proposed model can be represented as a decomposition of heterogeneous and homogeneous terms. The heterogeneous term is driven by latent factors in different subpopulations. The homogeneous term captures common variation in the covariates and shares common regression coefficients across subpopulations. Our proposed model attains a good balance between a global model and a group-specific model. The global model ignores the data heterogeneity, while the group-specific model fits each subgroup separately. We prove the estimation and prediction consistency for our proposed estimators, and show that it has better convergence rates than those of the group-specific and global models. We show that the extra cost of estimating latent factors is asymptotically negligible and the minimax rate is still attainable. We further demonstrate the robustness of our proposed method by studying its prediction error under a mis-specified group-specific model. Finally, we conduct simulation studies and analyze a data set from the Alzheimer's Disease Neuroimaging Initiative and an aggregated microarray data set to further demonstrate the competitiveness and interpretability of our proposed factor regression model.

摘要

在现代科学研究中,由于复杂数据丰富,数据异质性普遍存在。我们针对具有异质子群体的数据提出了一种因子回归模型。所提出的模型可表示为异质项和同质项的分解。异质项由不同子群体中的潜在因子驱动。同质项捕捉协变量中的共同变化,并在子群体间共享共同的回归系数。我们提出的模型在全局模型和特定群体模型之间实现了良好的平衡。全局模型忽略数据异质性,而特定群体模型分别拟合每个子组。我们证明了所提出估计量的估计和预测一致性,并表明它比特定群体模型和全局模型具有更好的收敛速度。我们表明,估计潜在因子的额外成本在渐近意义上可忽略不计,并且极小极大率仍然可以达到。我们通过研究在错误指定的特定群体模型下的预测误差,进一步证明了所提出方法的稳健性。最后,我们进行了模拟研究,并分析了来自阿尔茨海默病神经影像学倡议的数据集和一个汇总的微阵列数据集,以进一步证明我们提出的因子回归模型的竞争力和可解释性。

相似文献

9
Robust analysis of cancer heterogeneity for high-dimensional data.高维数据中癌症异质性的稳健分析。
Stat Med. 2022 Nov 30;41(27):5448-5462. doi: 10.1002/sim.9578. Epub 2022 Sep 18.

引用本文的文献

1
Joint and Individual Component Regression.联合与个体成分回归
J Comput Graph Stat. 2024;33(3):763-773. doi: 10.1080/10618600.2023.2284227. Epub 2023 Dec 29.

本文引用的文献

1
Heterogeneity adjustment with applications to graphical model inference.应用于图形模型推理的异质性调整。
Electron J Stat. 2018;12(2):3908-3952. doi: 10.1214/18-EJS1466. Epub 2018 Dec 5.
2
Integrative factorization of bidimensionally linked matrices.二维关联矩阵的综合分解。
Biometrics. 2020 Mar;76(1):61-74. doi: 10.1111/biom.13141. Epub 2019 Nov 10.
3
Structural learning and integrative decomposition of multi-view data.多视图数据的结构学习与整合分解
Biometrics. 2019 Dec;75(4):1121-1132. doi: 10.1111/biom.13108. Epub 2019 Sep 15.
5
Embracing the Blessing of Dimensionality in Factor Models.拥抱因子模型中维度的福祉。
J Am Stat Assoc. 2018;113(521):380-389. doi: 10.1080/01621459.2016.1256815. Epub 2017 Nov 13.
7
A PARTIALLY LINEAR FRAMEWORK FOR MASSIVE HETEROGENEOUS DATA.用于海量异构数据的部分线性框架
Ann Stat. 2016 Aug;44(4):1400-1437. doi: 10.1214/15-AOS1410. Epub 2016 Jul 7.
8
Group Component Analysis for Multiblock Data: Common and Individual Feature Extraction.多区块数据的组成分分析:共同和个体特征提取。
IEEE Trans Neural Netw Learn Syst. 2016 Nov;27(11):2426-2439. doi: 10.1109/TNNLS.2015.2487364. Epub 2015 Oct 28.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验