Suppr超能文献

具有随时间变化的环境协变量的纵向二元数据的回归分析:偏差与效率。

Regression analysis of longitudinal binary data with time-dependent environmental covariates: bias and efficiency.

作者信息

Schildcrout Jonathan S, Heagerty Patrick J

机构信息

Department of Biostatistics, Vanderbilt University, S-2323 Medical Center North, Nashville, TN 37232-2158, USA.

出版信息

Biostatistics. 2005 Oct;6(4):633-52. doi: 10.1093/biostatistics/kxi033. Epub 2005 May 25.

Abstract

Generalized estimating equations (Liang and Zeger, 1986) is a widely used, moment-based procedure to estimate marginal regression parameters. However, a subtle and often overlooked point is that valid inference requires the mean for the response at time t to be expressed properly as a function of the complete past, present, and future values of any time-varying covariate. For example, with environmental exposures it may be necessary to express the response as a function of multiple lagged values of the covariate series. Despite the fact that multiple lagged covariates may be predictive of outcomes, researchers often focus interest on parameters in a 'cross-sectional' model, where the response is expressed as a function of a single lag in the covariate series. Cross-sectional models yield parameters with simple interpretations and avoid issues of collinearity associated with multiple lagged values of a covariate. Pepe and Anderson (1994), showed that parameter estimates for time-varying covariates may be biased unless the mean, given all past, present, and future covariate values, is equal to the cross-sectional mean or unless independence estimating equations are used. Although working independence avoids potential bias, many authors have shown that a poor choice for the response correlation model can lead to highly inefficient parameter estimates. The purpose of this paper is to study the bias-efficiency trade-off associated with working correlation choices for application with binary response data. We investigate data characteristics or design features (e.g. cluster size, overall response association, functional form of the response association, covariate distribution, and others) that influence the small and large sample characteristics of parameter estimates obtained from several different weighting schemes or equivalently 'working' covariance models. We find that the impact of covariance model choice depends highly on the specific structure of the data features, and that key aspects should be examined before choosing a weighting scheme.

摘要

广义估计方程(梁和泽格,1986)是一种广泛使用的基于矩的方法,用于估计边际回归参数。然而,一个微妙且常被忽视的要点是,有效的推断要求时间t处响应的均值能够恰当地表示为任何随时间变化的协变量的完整过去、当前和未来值的函数。例如,对于环境暴露,可能需要将响应表示为协变量序列多个滞后值的函数。尽管多个滞后协变量可能对结果具有预测性,但研究人员通常关注“横断面”模型中的参数,在该模型中,响应被表示为协变量序列单个滞后的函数。横断面模型产生的参数具有简单的解释,并且避免了与协变量多个滞后值相关的共线性问题。佩佩和安德森(1994)表明,除非给定所有过去、当前和未来协变量值时的均值等于横断面均值,或者使用独立估计方程,否则时变协变量的参数估计可能会有偏差。尽管工作独立性避免了潜在偏差,但许多作者表明,响应相关模型选择不当会导致参数估计效率极低。本文的目的是研究与二元响应数据应用中的工作相关选择相关的偏差 - 效率权衡。我们研究影响从几种不同加权方案或等效的“工作”协方差模型获得的参数估计的小样本和大样本特征的数据特征或设计特征(例如聚类大小、总体响应关联、响应关联的函数形式、协变量分布等)。我们发现协方差模型选择的影响高度依赖于数据特征的具体结构,并且在选择加权方案之前应检查关键方面。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验