Suppr超能文献

高维纵向数据的显著性检验

TEST OF SIGNIFICANCE FOR HIGH-DIMENSIONAL LONGITUDINAL DATA.

作者信息

Fang Ethan X, Ning Yang, Li Runze

机构信息

Department of Statistics, the Pennsylvania State University, University Park, PA 16802-2111, USA.

Department of Statistics and Data Science, Cornell University, Ithaca, NY 14850, USA.

出版信息

Ann Stat. 2020 Oct;48(5):2622-2645. doi: 10.1214/19-aos1900. Epub 2020 Sep 19.

Abstract

This paper concerns statistical inference for longitudinal data with ultrahigh dimensional covariates. We first study the problem of constructing confidence intervals and hypothesis tests for a low dimensional parameter of interest. The major challenge is how to construct a powerful test statistic in the presence of high-dimensional nuisance parameters and sophisticated within-subject correlation of longitudinal data. To deal with the challenge, we propose a new quadratic decorrelated inference function approach, which simultaneously removes the impact of nuisance parameters and incorporates the correlation to enhance the efficiency of the estimation procedure. When the parameter of interest is of fixed dimension, we prove that the proposed estimator is asymptotically normal and attains the semiparametric information bound, based on which we can construct an optimal Wald test statistic. We further extend this result and establish the limiting distribution of the estimator under the setting with the dimension of the parameter of interest growing with the sample size at a polynomial rate. Finally, we study how to control the false discovery rate (FDR) when a vector of high-dimensional regression parameters is of interest. We prove that applying the Storey (2002)'s procedure to the proposed test statistics for each regression parameter controls FDR asymptotically in longitudinal data. We conduct simulation studies to assess the finite sample performance of the proposed procedures. Our simulation results imply that the newly proposed procedure can control both Type I error for testing a low dimensional parameter of interest and the FDR in the multiple testing problem. We also apply the proposed procedure to a real data example.

摘要

本文关注具有超高维协变量的纵向数据的统计推断。我们首先研究为感兴趣的低维参数构建置信区间和假设检验的问题。主要挑战在于如何在存在高维干扰参数以及纵向数据复杂的个体内相关性的情况下构建一个强大的检验统计量。为应对这一挑战,我们提出了一种新的二次去相关推断函数方法,该方法同时消除干扰参数的影响并纳入相关性以提高估计过程的效率。当感兴趣的参数具有固定维度时,我们证明所提出的估计量渐近正态且达到半参数信息界,基于此我们可以构建一个最优的 Wald 检验统计量。我们进一步扩展这一结果,并在感兴趣参数的维度以多项式速率随样本量增长的设定下建立估计量的极限分布。最后,当感兴趣的是高维回归参数向量时,我们研究如何控制错误发现率(FDR)。我们证明将 Storey(2002)的方法应用于针对每个回归参数的所提出的检验统计量,在纵向数据中渐近地控制 FDR。我们进行模拟研究以评估所提出方法的有限样本性能。我们的模拟结果表明,新提出的方法可以控制用于检验感兴趣的低维参数的 I 型错误以及多重检验问题中的 FDR。我们还将所提出的方法应用于一个实际数据示例。

相似文献

1
TEST OF SIGNIFICANCE FOR HIGH-DIMENSIONAL LONGITUDINAL DATA.
Ann Stat. 2020 Oct;48(5):2622-2645. doi: 10.1214/19-aos1900. Epub 2020 Sep 19.
2
Collaborative double robust targeted maximum likelihood estimation.
Int J Biostat. 2010 May 17;6(1):Article 17. doi: 10.2202/1557-4679.1181.
3
Statistical Inference for High-Dimensional Models via Recursive Online-Score Estimation.
J Am Stat Assoc. 2021;116(535):1307-1318. doi: 10.1080/01621459.2019.1710154. Epub 2020 Jan 23.
4
On pseudolikelihood inference for semiparametric models with boundary problems.
Biometrika. 2017 Mar;104(1):165-179. doi: 10.1093/biomet/asw072. Epub 2017 Feb 18.
7
Testing generalized linear models with high-dimensional nuisance parameter.
Biometrika. 2023 Mar;110(1):83-99. doi: 10.1093/biomet/asac021. Epub 2022 Apr 5.
8
EFFICIENT ESTIMATION IN SUFFICIENT DIMENSION REDUCTION.
Ann Stat. 2013 Feb;41(1):250-268. doi: 10.1214/12-AOS1072SUPP.
9
A note on the estimation and inference with quadratic inference functions for correlated outcomes.
Commun Stat Simul Comput. 2022;51(11):6525-6536. doi: 10.1080/03610918.2020.1805463. Epub 2020 Aug 11.
10
Intrinsic Regression Models for Medial Representation of Subcortical Structures.
J Am Stat Assoc. 2012 Mar 1;107(497):12-23. doi: 10.1080/01621459.2011.643710.

引用本文的文献

1
Model-Free Statistical Inference on High-Dimensional Data.
J Am Stat Assoc. 2025;120(549):186-197. doi: 10.1080/01621459.2024.2310314. Epub 2024 Mar 8.
2
Variable selection in modelling clustered data via within-cluster resampling.
Can J Stat. 2025 Mar;53(1). doi: 10.1002/cjs.11824. Epub 2024 Aug 1.
3
Optimal Poisson subsampling decorrelated score for high-dimensional generalized linear models.
J Appl Stat. 2024 Feb 11;51(14):2719-2743. doi: 10.1080/02664763.2024.2315467. eCollection 2024.
4
Discussion of 'Statistical inference for streamed longitudinal data'.
Biometrika. 2023 Nov 15;110(4):867-869. doi: 10.1093/biomet/asad043. eCollection 2023 Dec.
5
Marginal false discovery rate for a penalized transformation survival model.
Comput Stat Data Anal. 2021 Aug;160. doi: 10.1016/j.csda.2021.107232. Epub 2021 Apr 2.

本文引用的文献

1
Testing and Confidence Intervals for High Dimensional Proportional Hazards Model.
J R Stat Soc Series B Stat Methodol. 2017 Nov;79(5):1415-1437. doi: 10.1111/rssb.12224. Epub 2016 Dec 26.
2
I-LAMM FOR SPARSE LEARNING: SIMULTANEOUS CONTROL OF ALGORITHMIC COMPLEXITY AND STATISTICAL ERROR.
Ann Stat. 2018 Apr;46(2):814-841. doi: 10.1214/17-AOS1568. Epub 2018 Apr 3.
3
CALIBRATING NON-CONVEX PENALIZED REGRESSION IN ULTRA-HIGH DIMENSION.
Ann Stat. 2013 Oct 1;41(5):2505-2536. doi: 10.1214/13-AOS1159.
4
Penalized generalized estimating equations for high-dimensional longitudinal data analysis.
Biometrics. 2012 Jun;68(2):353-60. doi: 10.1111/j.1541-0420.2011.01678.x. Epub 2011 Sep 28.
6
Interacting genetic loci on chromosomes 20 and 10 influence extreme human obesity.
Am J Hum Genet. 2003 Jan;72(1):115-24. doi: 10.1086/345648. Epub 2002 Dec 11.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验