Dajles Andres, Cavanaugh Joseph
Department of Biostatistics, University of Iowa, 145 N. Riverside Drive, Iowa City, IA 52242, USA.
Entropy (Basel). 2024 Jul 15;26(7):599. doi: 10.3390/e26070599.
Most statistical modeling applications involve the consideration of a candidate collection of models based on various sets of explanatory variables. The candidate models may also differ in terms of the structural formulations for the systematic component and the posited probability distributions for the random component. A common practice is to use an information criterion to select a model from the collection that provides an optimal balance between fidelity to the data and parsimony. The analyst then typically proceeds as if the chosen model was the only model ever considered. However, such a practice fails to account for the variability inherent in the model selection process, which can lead to inappropriate inferential results and conclusions. In recent years, inferential methods have been proposed for multimodel frameworks that attempt to provide an appropriate accounting of modeling uncertainty. In the frequentist paradigm, such methods should ideally involve model selection probabilities, i.e., the relative frequencies of selection for each candidate model based on repeated sampling. Model selection probabilities can be conveniently approximated through bootstrapping. When the Akaike information criterion is employed, Akaike weights are also commonly used as a surrogate for selection probabilities. In this work, we show that the conventional bootstrap approach for approximating model selection probabilities is impacted by bias. We propose a simple correction to adjust for this bias. We also argue that Akaike weights do not provide adequate approximations for selection probabilities, although they do provide a crude gauge of model plausibility.
大多数统计建模应用都涉及基于各种解释变量集来考虑一组候选模型。候选模型在系统成分的结构公式和随机成分的假定概率分布方面也可能有所不同。一种常见的做法是使用信息准则从该集合中选择一个模型,该模型能在拟合数据和简约性之间提供最佳平衡。然后分析师通常会继续进行下去,就好像所选模型是唯一被考虑过的模型一样。然而,这种做法没有考虑到模型选择过程中固有的变异性,这可能导致不恰当的推断结果和结论。近年来,已经针对多模型框架提出了推断方法,试图对建模不确定性进行适当的考量。在频率主义范式中,理想情况下,此类方法应涉及模型选择概率,即基于重复抽样的每个候选模型的相对选择频率。模型选择概率可以通过自助法方便地近似。当使用赤池信息准则时,赤池权重也通常被用作选择概率的替代。在这项工作中,我们表明用于近似模型选择概率的传统自助法受到偏差的影响。我们提出一种简单的校正方法来调整这种偏差。我们还认为,尽管赤池权重确实提供了模型合理性的粗略衡量,但它们并不能为选择概率提供充分的近似。