Sinharay Sandip
Educational Testing Service, Princeton, NJ, USA.
Appl Psychol Meas. 2022 Jan;46(1):19-39. doi: 10.1177/01466216211049209. Epub 2021 Oct 23.
Drasgow, Levine, and Zickar (1996) suggested a statistic based on the Neyman-Pearson lemma (NPL; e.g., Lehmann & Romano, 2005, p. 60) for detecting preknowledge on a known set of items. The statistic is a special case of the optimal appropriateness indices (OAIs) of Levine and Drasgow (1988) and is the most powerful statistic for detecting item preknowledge when the assumptions underlying the statistic hold for the data (e.g., Belov, 2016Belov, 2016; Drasgow et al., 1996). This paper demonstrated using real data analysis that one assumption underlying the statistic of Drasgow et al. (1996) is often likely to be violated in practice. This paper also demonstrated, using simulated data, that the statistic is not robust to realistic violations of its underlying assumptions. Together, the results from the real data and the simulations demonstrate that the statistic of Drasgow et al. (1996) may not always be the optimum statistic in practice and occasionally has smaller power than another statistic for detecting preknowledge on a known set of items, especially when the assumptions underlying the former statistic do not hold. The findings of this paper demonstrate the importance of keeping in mind the assumptions underlying and the limitations of any statistic or method.
德拉斯戈、莱文和齐卡尔(1996年)提出了一种基于奈曼-皮尔逊引理(NPL;例如,莱曼和罗曼诺,2005年,第60页)的统计量,用于检测在一组已知项目上的预先知晓情况。该统计量是莱文和德拉斯戈(1988年)的最优适宜性指数(OAIs)的一个特例,并且当该统计量所依据的假设对数据成立时(例如,别洛夫,2016年;德拉斯戈等人,1996年),它是检测项目预先知晓情况的最具功效的统计量。本文通过实际数据分析表明,德拉斯戈等人(1996年)的统计量所依据的一个假设在实际中常常可能被违背。本文还通过模拟数据表明,该统计量对于其基本假设的实际违背情况并不稳健。综合来看,实际数据和模拟结果表明,德拉斯戈等人(1996年)的统计量在实际中可能并不总是最优统计量,并且在检测一组已知项目上的预先知晓情况时,偶尔比另一个统计量的功效更小,尤其是当前者所依据的假设不成立时。本文的研究结果表明了牢记任何统计量或方法所依据的假设及其局限性的重要性。