Boulesteix Anne-Laure
Department of Medical Statistics and Epidemiology, Technical University of Munich, Ismaningerstr. 22, D-81675 Munich, Germany.
Biom J. 2006 Aug;48(5):838-48. doi: 10.1002/bimj.200510191.
We address the problem of maximally selected chi-square statistics in the case of a binary Y variable and a nominal X variable with several categories. The distribution of the maximally selected chi-square statistic has already been derived when the best cutpoint is chosen from a continuous or an ordinal X, but not when the best split is chosen from a nominal X. In this paper, we derive the exact distribution of the maximally selected chi-square statistic in this case using a combinatorial approach. Applications of the derived distribution to variable selection and hypothesis testing are discussed based on simulations. As an illustration, our method is applied to a birth data set.
我们研究了在二元Y变量和具有多个类别的名义X变量情况下最大选择卡方统计量的问题。当从连续或有序X中选择最佳切点时,最大选择卡方统计量的分布已经推导出来,但当从名义X中选择最佳分割时则没有。在本文中,我们使用组合方法推导了这种情况下最大选择卡方统计量的精确分布。基于模拟讨论了推导分布在变量选择和假设检验中的应用。作为一个例证,我们的方法应用于一个出生数据集。