针对异质子群体的高维因子回归

HIGH-DIMENSIONAL FACTOR REGRESSION FOR HETEROGENEOUS SUBPOPULATIONS.

作者信息

Wang Peiyao, Li Quefeng, Shen Dinggang, Liu Yufeng

机构信息

University of North Carolina at Chapel Hill.

ShanghaiTech University.

出版信息

Stat Sin. 2023 Jan;33(1):27-53. doi: 10.5705/ss.202020.0145.

DOI:10.5705/ss.202020.0145

PMID:37854586

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10583735/

Abstract

In modern scientific research, data heterogeneity is commonly observed owing to the abundance of complex data. We propose a factor regression model for data with heterogeneous subpopulations. The proposed model can be represented as a decomposition of heterogeneous and homogeneous terms. The heterogeneous term is driven by latent factors in different subpopulations. The homogeneous term captures common variation in the covariates and shares common regression coefficients across subpopulations. Our proposed model attains a good balance between a global model and a group-specific model. The global model ignores the data heterogeneity, while the group-specific model fits each subgroup separately. We prove the estimation and prediction consistency for our proposed estimators, and show that it has better convergence rates than those of the group-specific and global models. We show that the extra cost of estimating latent factors is asymptotically negligible and the minimax rate is still attainable. We further demonstrate the robustness of our proposed method by studying its prediction error under a mis-specified group-specific model. Finally, we conduct simulation studies and analyze a data set from the Alzheimer's Disease Neuroimaging Initiative and an aggregated microarray data set to further demonstrate the competitiveness and interpretability of our proposed factor regression model.

摘要

在现代科学研究中，由于复杂数据丰富，数据异质性普遍存在。我们针对具有异质子群体的数据提出了一种因子回归模型。所提出的模型可表示为异质项和同质项的分解。异质项由不同子群体中的潜在因子驱动。同质项捕捉协变量中的共同变化，并在子群体间共享共同的回归系数。我们提出的模型在全局模型和特定群体模型之间实现了良好的平衡。全局模型忽略数据异质性，而特定群体模型分别拟合每个子组。我们证明了所提出估计量的估计和预测一致性，并表明它比特定群体模型和全局模型具有更好的收敛速度。我们表明，估计潜在因子的额外成本在渐近意义上可忽略不计，并且极小极大率仍然可以达到。我们通过研究在错误指定的特定群体模型下的预测误差，进一步证明了所提出方法的稳健性。最后，我们进行了模拟研究，并分析了来自阿尔茨海默病神经影像学倡议的数据集和一个汇总的微阵列数据集，以进一步证明我们提出的因子回归模型的竞争力和可解释性。

相似文献

HIGH-DIMENSIONAL FACTOR REGRESSION FOR HETEROGENEOUS SUBPOPULATIONS.针对异质子群体的高维因子回归

Stat Sin. 2023 Jan;33(1):27-53. doi: 10.5705/ss.202020.0145.

Debiased inference for heterogeneous subpopulations in a high-dimensional logistic regression model.高维逻辑回归模型中异质子群体的偏差校正推断

Sci Rep. 2023 Dec 11;13(1):21979. doi: 10.1038/s41598-023-48903-x.

Individual Data Protected Integrative Regression Analysis of High-Dimensional Heterogeneous Data.高维异构数据的个体数据保护整合回归分析

J Am Stat Assoc. 2022;117(540):2105-2119. doi: 10.1080/01621459.2021.1904958. Epub 2021 May 19.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Synthesizing external aggregated information in the penalized Cox regression under population heterogeneity.在人群异质性下的惩罚 Cox 回归中综合外部聚合信息。

Stat Med. 2021 Oct 15;40(23):4915-4930. doi: 10.1002/sim.9101. Epub 2021 Jun 16.

Integrated partially linear model for multi-center studies with heterogeneity and batch effect in covariates.用于具有协变量异质性和批次效应的多中心研究的集成部分线性模型。

Statistics (Ber). 2023;57(5):987-1009. doi: 10.1080/02331888.2023.2258429. Epub 2023 Oct 13.

A penalized structural equation modeling method accounting for secondary phenotypes for variable selection on genetically regulated expression from PrediXcan for Alzheimer's disease.一种惩罚结构方程建模方法，考虑到次级表型，用于对 PrediXcan 中与阿尔茨海默病相关的遗传调控表达进行变量选择。

Biometrics. 2021 Mar;77(1):362-371. doi: 10.1111/biom.13286. Epub 2020 May 6.

On high-dimensional Poisson models with measurement error: Hypothesis testing for nonlinear nonconvex optimization.关于具有测量误差的高维泊松模型：非线性非凸优化的假设检验

Ann Stat. 2023 Feb;51(1):233-259. doi: 10.1214/22-aos2248. Epub 2023 Mar 23.

Robust analysis of cancer heterogeneity for high-dimensional data.高维数据中癌症异质性的稳健分析。

Stat Med. 2022 Nov 30;41(27):5448-5462. doi: 10.1002/sim.9578. Epub 2022 Sep 18.

Penalized robust learning for optimal treatment regimes with heterogeneous individualized treatment effects.具有异质个体化治疗效果的最优治疗方案的惩罚稳健学习

J Appl Stat. 2023 Feb 20;51(6):1151-1170. doi: 10.1080/02664763.2023.2180167. eCollection 2024.

引用本文的文献

Joint and Individual Component Regression.联合与个体成分回归

J Comput Graph Stat. 2024;33(3):763-773. doi: 10.1080/10618600.2023.2284227. Epub 2023 Dec 29.

本文引用的文献

Heterogeneity adjustment with applications to graphical model inference.应用于图形模型推理的异质性调整。

Electron J Stat. 2018;12(2):3908-3952. doi: 10.1214/18-EJS1466. Epub 2018 Dec 5.

Integrative factorization of bidimensionally linked matrices.二维关联矩阵的综合分解。

Biometrics. 2020 Mar;76(1):61-74. doi: 10.1111/biom.13141. Epub 2019 Nov 10.

Structural learning and integrative decomposition of multi-view data.多视图数据的结构学习与整合分解

Biometrics. 2019 Dec;75(4):1121-1132. doi: 10.1111/biom.13108. Epub 2019 Sep 15.

Flexible Locally Weighted Penalized Regression With Applications on Prediction of Alzheimer's Disease Neuroimaging Initiative's Clinical Scores.基于阿尔茨海默病神经影像学倡议临床评分预测的灵活局部加权惩罚回归。

IEEE Trans Med Imaging. 2019 Jun;38(6):1398-1408. doi: 10.1109/TMI.2018.2884943. Epub 2018 Dec 5.

Embracing the Blessing of Dimensionality in Factor Models.拥抱因子模型中维度的福祉。

J Am Stat Assoc. 2018;113(521):380-389. doi: 10.1080/01621459.2016.1256815. Epub 2017 Nov 13.

Fused Lasso Approach in Regression Coefficients Clustering - Learning Parameter Heterogeneity in Data Integration.回归系数聚类中的融合套索方法——数据整合中的学习参数异质性

J Mach Learn Res. 2016;17.

A PARTIALLY LINEAR FRAMEWORK FOR MASSIVE HETEROGENEOUS DATA.用于海量异构数据的部分线性框架

Ann Stat. 2016 Aug;44(4):1400-1437. doi: 10.1214/15-AOS1410. Epub 2016 Jul 7.

Group Component Analysis for Multiblock Data: Common and Individual Feature Extraction.多区块数据的组成分分析：共同和个体特征提取。

IEEE Trans Neural Netw Learn Syst. 2016 Nov;27(11):2426-2439. doi: 10.1109/TNNLS.2015.2487364. Epub 2015 Oct 28.

Large Covariance Estimation by Thresholding Principal Orthogonal Complements.通过阈值化主正交补进行大协方差估计

J R Stat Soc Series B Stat Methodol. 2013 Sep 1;75(4). doi: 10.1111/rssb.12016.

JOINT AND INDIVIDUAL VARIATION EXPLAINED (JIVE) FOR INTEGRATED ANALYSIS OF MULTIPLE DATA TYPES.用于多数据类型综合分析的联合与个体变异解释（JIVE）

Ann Appl Stat. 2013 Mar 1;7(1):523-542. doi: 10.1214/12-AOAS597.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验