测量误差模型中的变量选择

Variable Selection in Measurement Error Models.

作者信息

Ma Yanyuan, Li Runze

机构信息

Department of Statistics, Texas A&M University, College Station, TX 77843.

出版信息

Bernoulli (Andover). 2010;16(1):274-300. doi: 10.3150/09-bej205.

DOI:10.3150/09-bej205

PMID:20209020

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2832228/

Abstract

Measurement error data or errors-in-variable data are often collected in many studies. Natural criterion functions are often unavailable for general functional measurement error models due to the lack of information on the distribution of the unobservable covariates. Typically, the parameter estimation is via solving estimating equations. In addition, the construction of such estimating equations routinely requires solving integral equations, hence the computation is often much more intensive compared with ordinary regression models. Because of these difficulties, traditional best subset variable selection procedures are not applicable, and in the measurement error model context, variable selection remains an unsolved issue. In this paper, we develop a framework for variable selection in measurement error models via penalized estimating equations. We first propose a class of selection procedures for general parametric measurement error models and for general semiparametric measurement error models, and study the asymptotic properties of the proposed procedures. Then, under certain regularity conditions and with a properly chosen regularization parameter, we demonstrate that the proposed procedure performs as well as an oracle procedure. We assess the finite sample performance via Monte Carlo simulation studies and illustrate the proposed methodology through the empirical analysis of a familiar data set.

摘要

在许多研究中经常会收集测量误差数据或变量含误差数据。由于缺乏关于不可观测协变量分布的信息，对于一般的函数测量误差模型，通常无法获得自然准则函数。通常，参数估计是通过求解估计方程来进行的。此外，构建此类估计方程通常需要求解积分方程，因此与普通回归模型相比，计算量往往要大得多。由于这些困难，传统的最佳子集变量选择程序并不适用，并且在测量误差模型的背景下，变量选择仍然是一个未解决的问题。在本文中，我们通过惩罚估计方程开发了一个测量误差模型中的变量选择框架。我们首先为一般参数测量误差模型和一般半参数测量误差模型提出了一类选择程序，并研究了所提出程序的渐近性质。然后，在某些正则性条件下并通过适当选择正则化参数，我们证明所提出的程序与一种理想程序具有相同的性能。我们通过蒙特卡罗模拟研究评估有限样本性能，并通过对一个熟悉数据集的实证分析来说明所提出的方法。

相似文献

Variable Selection in Measurement Error Models.测量误差模型中的变量选择

Bernoulli (Andover). 2010;16(1):274-300. doi: 10.3150/09-bej205.

Variable Selection for Partially Linear Models with Measurement Errors.含测量误差的部分线性模型的变量选择

J Am Stat Assoc. 2009;104(485):234-248. doi: 10.1198/jasa.2009.0127.

Variable Selection in Semiparametric Regression Modeling.半参数回归建模中的变量选择

Ann Stat. 2008;36(1):261-286. doi: 10.1214/009053607000000604.

Variable selection for ultra-high dimensional quantile regression with missing data and measurement error.具有缺失数据和测量误差的超高维分位数回归的变量选择。

Stat Methods Med Res. 2021 Jan;30(1):129-150. doi: 10.1177/0962280220941533. Epub 2020 Aug 3.

NEW EFFICIENT ESTIMATION AND VARIABLE SELECTION METHODS FOR SEMIPARAMETRIC VARYING-COEFFICIENT PARTIALLY LINEAR MODELS.半参数变系数部分线性模型的新有效估计与变量选择方法

Ann Stat. 2011 Feb 1;39(1):305-332. doi: 10.1214/10-AOS842.

Estimation via corrected scores in general semiparametric regression models with error-prone covariates.在具有易出错协变量的一般半参数回归模型中通过校正分数进行估计。

Electron J Stat. 2011;5:1424-1449. doi: 10.1214/11-EJS647.

VARIABLE SELECTION FOR HIGH DIMENSIONAL MULTIVARIATE OUTCOMES.高维多元结果的变量选择

Stat Sin. 2014 Oct;24(4):1633-1654. doi: 10.5705/ss.2013.019.

VARIABLE SELECTION IN LINEAR MIXED EFFECTS MODELS.线性混合效应模型中的变量选择

Ann Stat. 2012 Aug 1;40(4):2043-2068. doi: 10.1214/12-AOS1028.

Penalized Estimating Functions and Variable Selection in Semiparametric Regression Models.半参数回归模型中的惩罚估计函数与变量选择

J Am Stat Assoc. 2008 Jun 1;103(482):672-680. doi: 10.1198/016214508000000184.

Variable selection in competing risks models based on quantile regression.基于分位数回归的竞争风险模型中的变量选择。

Stat Med. 2019 Oct 15;38(23):4670-4685. doi: 10.1002/sim.8326. Epub 2019 Jul 29.

引用本文的文献

BOOME: A Python package for handling misclassified disease and ultrahigh-dimensional error-prone gene expression data.BOOME：一个用于处理误分类疾病和超高维易出错基因表达数据的 Python 包。

PLoS One. 2022 Oct 27;17(10):e0276664. doi: 10.1371/journal.pone.0276664. eCollection 2022.

Logistic regression error-in-covariate models for longitudinal high-dimensional covariates.用于纵向高维协变量的逻辑回归协变量误差模型。

Stat. 2019;8(1). doi: 10.1002/sta4.246. Epub 2019 Dec 26.

STRATOS guidance document on measurement error and misclassification of variables in observational epidemiology: Part 2-More complex methods of adjustment and advanced topics.STRATOS关于观察性流行病学中变量测量误差和错误分类的指南文件：第2部分 - 更复杂的调整方法和高级主题。

Stat Med. 2020 Jul 20;39(16):2232-2263. doi: 10.1002/sim.8531. Epub 2020 Apr 3.

Applying the exposome concept in birth cohort research: a review of statistical approaches.将暴露组学概念应用于出生队列研究：统计方法综述。

Eur J Epidemiol. 2020 Mar;35(3):193-204. doi: 10.1007/s10654-020-00625-4. Epub 2020 Mar 27.

Instrumental variable approach to estimating the scalar-on-function regression model with measurement error with application to energy expenditure assessment in childhood obesity.带有测量误差的标量-函数回归模型的工具变量估计方法及其在儿童肥胖症能量消耗评估中的应用。

Stat Med. 2019 Sep 10;38(20):3764-3781. doi: 10.1002/sim.8179. Epub 2019 Jun 20.

Linear Model Selection when Covariates Contain Errors.协变量包含误差时的线性模型选择

J Am Stat Assoc. 2017;112(520):1553-1561. doi: 10.1080/01621459.2016.1219262. Epub 2017 Jun 29.

A functional generalized method of moments approach for longitudinal studies with missing responses and covariate measurement error.一种用于具有缺失响应和协变量测量误差的纵向研究的功能性广义矩估计方法。

Biometrika. 2012;99(1):151-165. doi: 10.1093/biomet/asr076. Epub 2012 Feb 1.

Variable Selection and Inference Procedures for Marginal Analysis of Longitudinal Data with Missing Observations and Covariate Measurement Error.具有缺失观测值和协变量测量误差的纵向数据边际分析的变量选择和推断程序

Can J Stat. 2015 Dec;43(4):498-518. doi: 10.1002/cjs.11268. Epub 2015 Oct 20.

Variable selection in semi-parametric models.半参数模型中的变量选择

Stat Methods Med Res. 2016 Aug;25(4):1736-52. doi: 10.1177/0962280213499679. Epub 2013 Aug 28.

本文引用的文献

Variable Selection for Partially Linear Models with Measurement Errors.含测量误差的部分线性模型的变量选择

J Am Stat Assoc. 2009;104(485):234-248. doi: 10.1198/jasa.2009.0127.

One-step Sparse Estimates in Nonconcave Penalized Likelihood Models.非凹惩罚似然模型中的一步稀疏估计

Ann Stat. 2008 Aug 1;36(4):1509-1533. doi: 10.1214/009053607000000802.

Discussion of "Sure Independence Screening for Ultra-High Dimensional Feature Space.《超高维特征空间中的确定独立性筛选》讨论

J R Stat Soc Series B Stat Methodol. 2008 Nov;70(5):903. doi: 10.1111/j.1467-9868.2008.00674.x.

Variable Selection using MM Algorithms.使用MM算法进行变量选择

Ann Stat. 2005;33(4):1617-1642. doi: 10.1214/009053605000000200.

Variable selection for multivariate failure time data.多变量失效时间数据的变量选择

Biometrika. 2005;92(2):303-316. doi: 10.1093/biomet/92.2.303.

Tuning parameter selectors for the smoothly clipped absolute deviation method.用于平滑截断绝对偏差方法的调优参数选择器。

Biometrika. 2007 Aug 1;94(3):553-568. doi: 10.1093/biomet/asm053.

PROFILE-KERNEL LIKELIHOOD INFERENCE WITH DIVERGING NUMBER OF PARAMETERS.参数数量不断变化时的轮廓核似然推断。

Ann Stat. 2008 Oct;36(5):2232-2260. doi: 10.1214/07-AOS544.

Variable Selection in Semiparametric Regression Modeling.半参数回归建模中的变量选择

Ann Stat. 2008;36(1):261-286. doi: 10.1214/009053607000000604.

Efficient statistical inference procedures for partially nonlinear models and their applications.部分非线性模型的高效统计推断程序及其应用。

Biometrics. 2008 Sep;64(3):904-911. doi: 10.1111/j.1541-0420.2007.00937.x. Epub 2007 Nov 19.

Overall and coronary heart disease mortality rates in relation to major risk factors in 325,348 men screened for the MRFIT. Multiple Risk Factor Intervention Trial.在为多重危险因素干预试验（MRFIT）进行筛查的325348名男性中，总体死亡率和冠心病死亡率与主要危险因素的关系。

Am Heart J. 1986 Oct;112(4):825-36. doi: 10.1016/0002-8703(86)90481-3.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验