Cambridge Cognition, Cambridge, UK.
School of Psychological Science, University of Bristol, Bristol, UK.
Appl Neuropsychol Adult. 2022 Sep-Oct;29(5):889-892. doi: 10.1080/23279095.2020.1860987. Epub 2021 Jan 6.
Test-retest reliability is essential to the development and validation of psychometric tools. Here we respond to the article by Karlsen et al. (Applied Neuropsychology: Adult, 2020), reporting test-retest reliability on the Cambridge Neuropsychological Test Automated Battery (CANTAB), with results that are in keeping with prior research on CANTAB and the broader cognitive assessment literature. However, after adopting a high threshold for adequate test-retest reliability, the authors report inadequate reliability for many measures. In this commentary we provide examples of stable, trait-like constructs which we would expect to remain highly consistent across longer time periods, and contrast these with measures which show acute within-subject change in response to contextual or psychological factors. Measures characterized by greater true within-subject variability typically have lower test-retest reliability, requiring adequate powering in research examining group differences and longitudinal change. However, these measures remain sensitive to important clinical and functional outcomes. Setting arbitrarily elevated test-retest reliability thresholds for test adoption in cognitive research limits the pool of available tools and precludes the adoption of many well-established tests showing consistent contextual, diagnostic, and treatment sensitivity. Overall, test-retest reliability must be balanced with other theoretical and practical considerations in study design, including test relevance and sensitivity.
测试重测信度对于心理计量工具的开发和验证至关重要。在这里,我们对 Karlsen 等人的文章(《应用神经心理学:成人》,2020 年)做出回应,该文章报告了剑桥神经心理测试自动化电池(CANTAB)的测试重测信度,结果与之前关于 CANTAB 的研究以及更广泛的认知评估文献一致。然而,在采用足够高的测试重测信度标准后,作者报告了许多测试的信度不足。在这篇评论中,我们提供了一些稳定的、特质样的结构的例子,这些结构我们预计在较长时间内会保持高度一致,并将这些结构与那些由于环境或心理因素而在个体内表现出急性变化的测试进行对比。那些表现出更大的个体内变异性的测试通常具有较低的测试重测信度,因此在研究组间差异和纵向变化时需要有足够的效力。然而,这些测试仍然对重要的临床和功能结果敏感。在认知研究中,为了采用测试而任意提高测试重测信度标准,会限制可用工具的范围,并排除许多具有一致环境、诊断和治疗敏感性的成熟测试。总体而言,在研究设计中,测试重测信度必须与其他理论和实际考虑因素相平衡,包括测试的相关性和敏感性。