Suppr超能文献

用于重复测量分析的非参数变系数模型中的变量选择

Variable Selection in Nonparametric Varying-Coefficient Models for Analysis of Repeated Measurements.

作者信息

Wang Lifeng, Li Hongzhe, Huang Jianhua Z

机构信息

Department of Biostatistics and Epidemiology, University of Pennsylvania School of Medicine, Philadelphia, PA 19104,

出版信息

J Am Stat Assoc. 2008 Dec 1;103(484):1556-1569. doi: 10.1198/016214508000000788.

Abstract

Nonparametric varying-coefficient models are commonly used for analysis of data measured repeatedly over time, including longitudinal and functional responses data. While many procedures have been developed for estimating the varying-coefficients, the problem of variable selection for such models has not been addressed. In this article, we present a regularized estimation procedure for variable selection that combines basis function approximations and the smoothly clipped absolute deviation (SCAD) penalty. The proposed procedure simultaneously selects significant variables with time-varying effects and estimates the nonzero smooth coefficient functions. Under suitable conditions, we have established the theoretical properties of our procedure, including consistency in variable selection and the oracle property in estimation. Here the oracle property means that the asymptotic distribution of an estimated coefficient function is the same as that when it is known a priori which variables are in the model. The method is illustrated with simulations and two real data examples, one for identifying risk factors in the study of AIDS and one using microarray time-course gene expression data to identify the transcription factors related to the yeast cell cycle process.

摘要

非参数变系数模型常用于分析随时间重复测量的数据,包括纵向数据和函数响应数据。虽然已经开发了许多用于估计变系数的方法,但此类模型的变量选择问题尚未得到解决。在本文中,我们提出了一种用于变量选择的正则化估计方法,该方法结合了基函数逼近和平滑截断绝对偏差(SCAD)惩罚。所提出的方法同时选择具有时变效应的显著变量,并估计非零平滑系数函数。在适当的条件下,我们建立了该方法的理论性质,包括变量选择的一致性和估计中的神谕性质。这里的神谕性质是指估计系数函数的渐近分布与事先知道模型中哪些变量时的渐近分布相同。通过模拟和两个实际数据示例对该方法进行了说明,一个用于识别艾滋病研究中的风险因素,另一个使用微阵列时间序列基因表达数据来识别与酵母细胞周期过程相关的转录因子。

相似文献

1
Variable Selection in Nonparametric Varying-Coefficient Models for Analysis of Repeated Measurements.
J Am Stat Assoc. 2008 Dec 1;103(484):1556-1569. doi: 10.1198/016214508000000788.
2
VARIABLE SELECTION AND ESTIMATION IN HIGH-DIMENSIONAL VARYING-COEFFICIENT MODELS.
Stat Sin. 2011 Oct 1;21(4):1515-1540. doi: 10.5705/ss.2009.316.
3
Surface Estimation, Variable Selection, and the Nonparametric Oracle Property.
Stat Sin. 2011 Apr;21(2):679-705. doi: 10.5705/ss.2011.030a.
5
PENALIZED VARIABLE SELECTION PROCEDURE FOR COX MODELS WITH SEMIPARAMETRIC RELATIVE RISK.
Ann Stat. 2010 Aug 1;38(4):2092-2117. doi: 10.1214/09-AOS780.
6
Additive varying-coefficient model for nonlinear gene-environment interactions.
Stat Appl Genet Mol Biol. 2018 Feb 8;17(2):sagmb-2017-0008. doi: 10.1515/sagmb-2017-0008.
8
Weighted Wilcoxon-type smoothly clipped absolute deviation method.
Biometrics. 2009 Jun;65(2):564-71. doi: 10.1111/j.1541-0420.2008.01099.x. Epub 2008 Jul 18.

引用本文的文献

2
Functional Concurrent Regression Mixture Models Using Spiked Ewens-Pitman Attraction Priors.
Bayesian Anal. 2024 Dec;19(4):1067-1095. doi: 10.1214/23-ba1380. Epub 2023 May 2.
4
Functional-Coefficient Quantile Regression for Panel Data with Latent Group Structure.
J Bus Econ Stat. 2024;42(3):1026-1040. doi: 10.1080/07350015.2023.2277172. Epub 2023 Dec 15.
6
Robust and sparse learning of varying coefficient models with high-dimensional features.
J Appl Stat. 2022 Aug 9;50(16):3312-3336. doi: 10.1080/02664763.2022.2109129. eCollection 2023.
7
Varying-coefficients for regional quantile via KNN-based LASSO with applications to health outcome study.
Stat Med. 2023 Sep 30;42(22):3903-3918. doi: 10.1002/sim.9839. Epub 2023 Jun 27.
8
Springer: An R package for bi-level variable selection of high-dimensional longitudinal data.
Front Genet. 2023 Apr 6;14:1088223. doi: 10.3389/fgene.2023.1088223. eCollection 2023.
9
Model estimation and selection for partial linear varying coefficient EV models with longitudinal data.
J Appl Stat. 2021 Mar 23;50(3):512-534. doi: 10.1080/02664763.2021.1904847. eCollection 2023.

本文引用的文献

1
Group SCAD regression analysis for microarray time course gene expression data.
Bioinformatics. 2007 Jun 15;23(12):1486-94. doi: 10.1093/bioinformatics/btm125. Epub 2007 Apr 26.
2
Clustering of genes into regulons using integrated modeling-COGRIM.
Genome Biol. 2007;8(1):R4. doi: 10.1186/gb-2007-8-1-r4.
3
Statistical methods for identifying yeast cell cycle transcription factors.
Proc Natl Acad Sci U S A. 2005 Sep 20;102(38):13532-7. doi: 10.1073/pnas.0505874102. Epub 2005 Sep 12.
4
Identifying cooperativity among transcription factors controlling the cell cycle in yeast.
Nucleic Acids Res. 2003 Dec 1;31(23):7024-31. doi: 10.1093/nar/gkg894.
5
Integrating regulatory motif discovery and genome-wide expression analysis.
Proc Natl Acad Sci U S A. 2003 Mar 18;100(6):3339-44. doi: 10.1073/pnas.0630591100. Epub 2003 Mar 7.
6
Clustering of time-course gene expression data using a mixed-effects model with B-splines.
Bioinformatics. 2003 Mar 1;19(4):474-82. doi: 10.1093/bioinformatics/btg014.
7
Transcriptional regulatory networks in Saccharomyces cerevisiae.
Science. 2002 Oct 25;298(5594):799-804. doi: 10.1126/science.1075090.
8
Serial regulation of transcriptional regulators in the yeast cell cycle.
Cell. 2001 Sep 21;106(6):697-708. doi: 10.1016/s0092-8674(01)00494-9.
9
Nonparametric mixed effects models for unequally sampled noisy curves.
Biometrics. 2001 Mar;57(1):253-9. doi: 10.1111/j.0006-341x.2001.00253.x.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验