Post Justin B, Bondell Howard D
Department of Statistics, North Carolina State University, Raleigh, North Carolina 27695-8203, USA.
Biometrics. 2013 Mar;69(1):70-9. doi: 10.1111/j.1541-0420.2012.01810.x. Epub 2013 Jan 17.
When faced with categorical predictors and a continuous response, the objective of an analysis often consists of two tasks: finding which factors are important and determining which levels of the factors differ significantly from one another. Often times, these tasks are done separately using Analysis of Variance (ANOVA) followed by a post hoc hypothesis testing procedure such as Tukey's Honestly Significant Difference test. When interactions between factors are included in the model the collapsing of levels of a factor becomes a more difficult problem. When testing for differences between two levels of a factor, claiming no difference would refer not only to equality of main effects, but also to equality of each interaction involving those levels. This structure between the main effects and interactions in a model is similar to the idea of heredity used in regression models. This article introduces a new method for accomplishing both of the common analysis tasks simultaneously in an interaction model while also adhering to the heredity-type constraint on the model. An appropriate penalization is constructed that encourages levels of factors to collapse and entire factors to be set to zero. It is shown that the procedure has the oracle property implying that asymptotically it performs as well as if the exact structure were known beforehand. We also discuss the application to estimating interactions in the unreplicated case. Simulation studies show the procedure outperforms post hoc hypothesis testing procedures as well as similar methods that do not include a structural constraint. The method is also illustrated using a real data example.
当面对分类预测变量和连续响应变量时,分析的目标通常包括两项任务:找出哪些因素是重要的,并确定这些因素的哪些水平彼此之间存在显著差异。通常,这些任务是分别使用方差分析(ANOVA),然后再进行诸如Tukey's Honestly Significant Difference检验等事后假设检验程序来完成的。当模型中包含因素之间的交互作用时,因素水平的合并就会成为一个更棘手的问题。在检验一个因素的两个水平之间的差异时,声称没有差异不仅意味着主效应相等,还意味着涉及这些水平的每个交互作用都相等。模型中主效应和交互作用之间的这种结构类似于回归模型中使用的遗传概念。本文介绍了一种新方法,可在交互模型中同时完成这两项常见的分析任务,同时还遵循对模型的遗传类型约束。构建了一种适当的惩罚项,以促使因素水平合并,并将整个因素设置为零。结果表明,该程序具有神谕性质,这意味着在渐近情况下,它的表现与事先知道确切结构时一样好。我们还讨论了在无重复情况下估计交互作用的应用。模拟研究表明,该程序优于事后假设检验程序以及不包括结构约束的类似方法。本文还使用一个实际数据示例对该方法进行了说明。