Quintana Fernando A, Müler Peter, Rosner Gary L, Munsell Mark
Departamento de Estadística, Pontificia Universidad Católica de Chile, Santiago, CHILE.
Bayesian Anal. 2008;3(2):317-338. doi: 10.1214/08-BA312.
We analyze complete sequences of successes (hits, walks, and sacrifices) for a group of players from the American and National Leagues, collected over 4 seasons. The goal is to describe how players' performances vary from season to season. In particular, we wish to assess and compare the effect of available occasion-specific covariates over seasons. The data are binary sequences for each player and each season. We model dependence in the binary sequence by an autoregressive logistic model. The model includes lagged terms up to a fixed order. For each player and season we introduce a different set of autologistic regression coefficients, i.e., the regression coefficients are random effects that are specific to each season and player. We use a nonparametric approach to define a random effects distribution. The nonparametric model is defined as a mixture with a Dirichlet process prior for the mixing measure. The described model is justified by a representation theorem for order-k exchangeable sequences. Besides the repeated measurements for each season and player, multiple seasons within a given player define an additional level of repeated measurements. We introduce dependence at this level of repeated measurements by relating the season-specific random effects vectors in an autoregressive fashion. We ultimately conclude that while some covariates like the ERA of the opposing pitcher are always relevant, others like an indicator for the game being into the seventh inning may be significant only for certain seasons, and some others, like the score of the game, can safely be ignored.
我们分析了美国联盟和国家联盟一组球员在4个赛季中完整的成功序列(安打、四坏球保送和牺牲打)。目的是描述球员的表现如何随赛季变化。特别是,我们希望评估和比较各赛季中可用的特定场合协变量的影响。数据是每个球员和每个赛季的二元序列。我们通过自回归逻辑模型对二元序列中的依赖性进行建模。该模型包括固定阶数以内的滞后项。对于每个球员和赛季,我们引入一组不同的自逻辑回归系数,即回归系数是特定于每个赛季和球员的随机效应。我们使用非参数方法来定义随机效应分布。非参数模型被定义为混合测度具有狄利克雷过程先验的混合模型。所描述的模型由k阶可交换序列的表示定理证明合理。除了每个赛季和球员的重复测量外,给定球员的多个赛季定义了另一个重复测量层次。我们通过以自回归方式关联特定赛季的随机效应向量,在这个重复测量层次引入依赖性。我们最终得出结论,虽然一些协变量,如对手投手的自责分率总是相关的,但其他一些协变量,如比赛进入第七局的指标可能仅在某些赛季显著,还有一些其他协变量,如比赛比分,可以安全地忽略。