Grøn Randi, Gerds Thomas A, Andersen Per K
Section of Biostatistics, University of Copenhagen, Copenhagen, Denmark.
Stat Med. 2016 Mar 30;35(7):1117-29. doi: 10.1002/sim.6755. Epub 2015 Sep 30.
Poisson regression is an important tool in register-based epidemiology where it is used to study the association between exposure variables and event rates. In this paper, we will discuss the situation with 'large n and small p', where n is the sample size and p is the number of available covariates. Specifically, we are concerned with modeling options when there are time-varying covariates that can have time-varying effects. One problem is that tests of the proportional hazards assumption, of no interactions between exposure and other observed variables, or of other modeling assumptions have large power due to the large sample size and will often indicate statistical significance even for numerically small deviations that are unimportant for the subject matter. Another problem is that information on important confounders may be unavailable. In practice, this situation may lead to simple working models that are then likely misspecified. To support and improve conclusions drawn from such models, we discuss methods for sensitivity analysis, for estimation of average exposure effects using aggregated data, and a semi-parametric bootstrap method to obtain robust standard errors. The methods are illustrated using data from the Danish national registries investigating the diabetes incidence for individuals treated with antipsychotics compared with the general unexposed population.
泊松回归是基于登记处的流行病学中的一种重要工具,用于研究暴露变量与事件发生率之间的关联。在本文中,我们将讨论“大n小p”的情况,其中n是样本量,p是可用协变量的数量。具体而言,我们关注存在具有随时间变化效应的随时间变化协变量时的建模选项。一个问题是,由于样本量较大,对比例风险假设、暴露与其他观察变量之间无交互作用或其他建模假设的检验具有很大的功效,即使对于对主题而言数值上较小且不重要的偏差,也常常会显示出统计学显著性。另一个问题是,关于重要混杂因素的信息可能无法获得。在实际中,这种情况可能导致简单的工作模型,而这些模型随后可能被错误设定。为了支持和改进从此类模型得出的结论,我们讨论了敏感性分析方法、使用汇总数据估计平均暴露效应的方法,以及一种用于获得稳健标准误的半参数自助法。使用丹麦国家登记处的数据对这些方法进行了说明,该数据调查了与未暴露的普通人群相比,接受抗精神病药物治疗的个体的糖尿病发病率。