Metsämuuronen Jari, Niemensivu Timi
Faculty of Science, Turku Research Institute for Learning Analytics, University of Turku, Turku, Finland.
Appl Psychol Meas. 2025 Jun 24:01466216251350159. doi: 10.1177/01466216251350159.
Communicating the factual meaning of a particular reliability estimate is sometimes difficult. What does a specific reliability estimate of 0.80 or 0.95 mean in common language? Deflation-corrected estimates of reliability (DCER) using Somers' or Goodman-Kruskal as the item-score correlations are transformed into forms where specific estimates from the family of common language effect sizes are visible. This makes it possible to communicate reliability estimates using a common language and to evaluate the magnitude of a particular reliability estimate in the same way and with the same metric as we do with effect size estimates. Using a DCER, we can say that with = 40 items, if the reliability is 0.95, in 80 out of 100 random pairs of test takers from different subpopulations on all items combined, those with a higher item response will also score higher on the test. In this case, using the thresholds familiar from effect sizes, we can say that the reliability is "very high." The transformation of the reliability estimate into a common language effect size depends on the size of the item-score association estimates and the number of items, so no closed-form equations for the transformations are given. However, relevant thresholds are provided for practical use.
传达特定可靠性估计的实际意义有时很困难。用通俗的语言来说,特定的可靠性估计值0.80或0.95意味着什么?使用Somers' 或Goodman-Kruskal 作为项目得分相关性的经通胀校正的可靠性估计值(DCER)被转换为可见通用语言效应量族中特定估计值的形式。这使得使用通用语言传达可靠性估计值成为可能,并能够以与效应量估计相同的方式和相同的度量标准来评估特定可靠性估计值的大小。使用DCER,我们可以说,对于有40个项目的情况,如果可靠性为0.95,在来自不同亚群体的100对随机测试者中,就所有项目综合来看,在80对中,项目反应较高的测试者在测试中得分也会更高。在这种情况下,使用效应量中熟悉的阈值,我们可以说可靠性“非常高”。可靠性估计值向通用语言效应量的转换取决于项目得分关联估计值的大小和项目数量,因此没有给出转换的封闭形式方程。然而,提供了相关阈值以供实际使用。