Chang Joshua C
Epidemiology and Biostatistics Section, Rehabilitation Medicine Department, The National Institutes of Health, Clinical Center, Bethesda, MD 20892, USA.
R Soc Open Sci. 2019 Mar 20;6(3):182174. doi: 10.1098/rsos.182174. eCollection 2019 Mar.
Consider the problem of modelling memory effects in discrete-state random walks using higher-order Markov chains. This paper explores cross-validation and information criteria as proxies for a model's predictive accuracy. Our objective is to select, from data, the number of prior states of recent history upon which a trajectory is statistically dependent. Through simulations, I evaluate these criteria in the case where data are drawn from systems with fixed orders of history, noting trends in the relative performance of the criteria. As a real-world illustrative example of these methods, this manuscript evaluates the problem of detecting statistical dependencies in shot outcomes in free throw shooting. Over three National Basketball Association (NBA) seasons analysed, several players exhibited statistical dependencies in free throw hitting probability of various types-hot handedness, cold handedness and error correction. For the 2013-2014 to 2015-2016 NBA seasons, I detected statistical dependencies in 23% of all player-seasons. Focusing on a single player, in two of these three seasons, LeBron James shot a better percentage after an immediate miss than otherwise. Conditioning on the previous outcome makes for a more-predictive model than treating free throw makes as independent. When extended specifically to LeBron James' 2016-2017 season, a model depending on the previous shot (single-step Markovian) does not clearly beat a model with independent outcomes. An error-correcting variable length model of two parameters, where James shoots a higher percentage after a missed free throw than otherwise, is more predictive than either model.
考虑使用高阶马尔可夫链对离散状态随机游走中的记忆效应进行建模的问题。本文探讨了交叉验证和信息准则作为模型预测准确性的代理指标。我们的目标是从数据中选择轨迹在统计上依赖的近期历史的先前状态数量。通过模拟,我在数据来自具有固定历史阶数的系统的情况下评估这些准则,注意到这些准则相对性能的趋势。作为这些方法的一个实际示例,本文评估了罚球投篮中命中结果的统计依赖性检测问题。在分析的三个美国职业篮球联赛(NBA)赛季中,几名球员在各种类型的罚球命中概率上表现出统计依赖性——手感火热、手感冰冷和纠错。在2013 - 2014赛季到2015 - 2016赛季期间,我在所有球员赛季中有23%检测到了统计依赖性。以一名球员为例,在这三个赛季中的两个赛季里,勒布朗·詹姆斯在紧接着一次未命中之后的罚球命中率比其他情况更高。基于先前的结果进行条件设定会比将罚球命中视为独立事件产生一个更具预测性的模型。当具体扩展到勒布朗·詹姆斯2016 - 2017赛季时,一个依赖于前一次投篮的模型(单步马尔可夫模型)并没有明显优于一个具有独立结果的模型。一个两参数的纠错可变长度模型,即詹姆斯在罚球未命中后的命中率高于其他情况,比这两个模型都更具预测性。