Kahveci Sercan, Bathke Arne C, Blechert Jens
Department of Psychology, Paris-Lodron-University of Salzburg, Hellbrunner Straße 34, 5020, Salzburg, Austria.
Centre for Cognitive Neuroscience, Paris-Lodron-University of Salzburg, Hellbrunner Straße 34, 5020, Salzburg, Austria.
Psychon Bull Rev. 2025 Apr;32(2):652-673. doi: 10.3758/s13423-024-02597-y. Epub 2024 Oct 23.
While it has become standard practice to report the reliability of self-report scales, it remains uncommon to do the same for experimental paradigms. To facilitate this practice, we review old and new ways to compute reliability in reaction-time tasks, and we compare their accuracy using a simulation study. Highly inaccurate and negatively biased reliability estimates are obtained through the common practice of averaging sets of trials and submitting them to Cronbach's alpha. Much more accurate reliability estimates are obtained using split-half reliability methods, especially by computing many random split-half correlations and aggregating them in a metric known as permutation-based split-half reliability. Through reanalysis of existing data and comparison of reliability values reported in the literature, we confirm that Cronbach's alpha also tends to be lower than split-half reliability in real data. We further establish a set of practices to maximize the accuracy of the permutation-based split-half reliability coefficient through simulations. We find that its accuracy is improved by ensuring each split-half dataset contains an approximately equal number of trials for each stimulus, by correcting the averaged correlation for test length using a modified variant of the Spearman-Brown formula, and by computing a sufficient number of split-half correlations: around 5,400 are needed to obtain a stable estimate for median-based double-difference scores computed from 30 participants and 256 trials. To conclude, we review the available software for computing this coefficient.
虽然报告自我报告量表的信度已成为标准做法,但对实验范式进行同样的操作仍不常见。为了促进这种做法,我们回顾了计算反应时任务中信度的新旧方法,并通过模拟研究比较了它们的准确性。通过对试验集进行平均并将其提交给克朗巴哈系数这一常见做法,会得到高度不准确且有负偏差的信度估计值。使用分半信度方法可获得更准确的信度估计值,特别是通过计算许多随机分半相关性并将它们汇总到一种称为基于排列的分半信度的度量中。通过对现有数据的重新分析以及对文献中报告的信度值的比较,我们证实,在实际数据中,克朗巴哈系数也往往低于分半信度。我们通过模拟进一步确立了一套做法,以最大限度地提高基于排列的分半信度系数的准确性。我们发现,通过确保每个分半数据集对每个刺激包含大致相等数量的试验、使用斯皮尔曼 - 布朗公式的修正变体校正平均相关性的测试长度以及计算足够数量的分半相关性,可提高其准确性:对于由30名参与者和256次试验计算出的基于中位数的双差异分数,需要大约5400次分半相关性才能获得稳定的估计值。最后,我们回顾了用于计算该系数的可用软件。