Donohue M C, Overholser R, Xu R, Vaida F
Division of Biostatistics and Bioinformatics, Department of Family and Preventive Medicine, University of California, San Diego, CA 92093, U.S.A. ,
Biometrika. 2011 Sep;98(3):685-700. doi: 10.1093/biomet/asr023. Epub 2011 Jul 13.
We study model selection for clustered data, when the focus is on cluster specific inference. Such data are often modelled using random effects, and conditional Akaike information was proposed in Vaida & Blanchard (2005) and used to derive an information criterion under linear mixed models. Here we extend the approach to generalized linear and proportional hazards mixed models. Outside the normal linear mixed models, exact calculations are not available and we resort to asymptotic approximations. In the presence of nuisance parameters, a profile conditional Akaike information is proposed. Bootstrap methods are considered for their potential advantage in finite samples. Simulations show that the performance of the bootstrap and the analytic criteria are comparable, with bootstrap demonstrating some advantages for larger cluster sizes. The proposed criteria are applied to two cancer datasets to select models when the cluster-specific inference is of interest.
我们研究聚类数据的模型选择问题,此时重点在于特定聚类的推断。此类数据通常使用随机效应进行建模,Vaida和Blanchard(2005年)提出了条件赤池信息,并用于推导线性混合模型下的信息准则。在此,我们将该方法扩展到广义线性和比例风险混合模型。在正态线性混合模型之外,无法进行精确计算,因此我们采用渐近近似。在存在干扰参数的情况下,提出了一种轮廓条件赤池信息。考虑到自抽样方法在有限样本中的潜在优势,对其进行了研究。模拟结果表明,自抽样方法和分析准则的性能相当,自抽样方法在聚类规模较大时表现出一些优势。当关注特定聚类的推断时,将所提出的准则应用于两个癌症数据集以选择模型。