Cantoni E, Field C, Mills Flemming J, Ronchetti E
Department of Econometrics, University of Geneva, CH-1211 Geneva 4, Switzerland.
Stat Med. 2007 Feb 20;26(4):919-30. doi: 10.1002/sim.2572.
Longitudinal models are commonly used for studying data collected on individuals repeatedly through time. While there are now a variety of such models available (marginal models, mixed effects models, etc.), far fewer options exist for the closely related issue of variable selection. In addition, longitudinal data typically derive from medical or other large-scale studies where often large numbers of potential explanatory variables and hence even larger numbers of candidate models must be considered. Cross-validation is a popular method for variable selection based on the predictive ability of the model. Here, we propose a cross-validation Markov chain Monte Carlo procedure as a general variable selection tool which avoids the need to visit all candidate models. Inclusion of a 'one-standard error' rule provides users with a collection of good models as is often desired. We demonstrate the effectiveness of our procedure both in a simulation setting and in a real application.
纵向模型通常用于研究通过时间对个体反复收集的数据。虽然现在有多种此类模型(边际模型、混合效应模型等),但对于密切相关的变量选择问题,可用的选项要少得多。此外,纵向数据通常来自医学或其他大规模研究,其中往往有大量潜在的解释变量,因此必须考虑的候选模型数量甚至更多。交叉验证是一种基于模型预测能力进行变量选择的常用方法。在这里,我们提出一种交叉验证马尔可夫链蒙特卡罗程序作为一种通用的变量选择工具,它无需考虑所有候选模型。纳入“一个标准误差”规则为用户提供了一组通常所期望的良好模型。我们在模拟环境和实际应用中都证明了我们程序的有效性。