Robinson Maria M, Williams Jamal R, Wixted John T, Brady Timothy F
Department of Psychology, University of Warwick, Coventry, UK.
Department of Psychology, Yale University, New Haven, CT, USA.
Psychon Bull Rev. 2025 Apr;32(2):547-569. doi: 10.3758/s13423-024-02562-9. Epub 2024 Sep 17.
Research on best practices in theory assessment highlights that testing theories is challenging because they inherit a new set of assumptions as soon as they are linked to a specific methodology. In this article, we integrate and build on this work by demonstrating the breadth of these challenges. We show that tracking auxiliary assumptions is difficult because they are made at different stages of theory testing and at multiple levels of a theory. We focus on these issues in a reanalysis of a seminal study and its replications, both of which use a simple working-memory paradigm and a mainstream computational modeling approach. These studies provide the main evidence for "all-or-none" recognition models of visual working memory and are still used as the basis for how to measure performance in popular visual working-memory tasks. In our reanalysis, we find that core practical auxiliary assumptions were unchecked and violated; the original model comparison metrics and data were not diagnostic in several experiments. Furthermore, we find that models were not matched on "theory general" auxiliary assumptions, meaning that the set of tested models was restricted, and not matched in theoretical scope. After testing these auxiliary assumptions and identifying diagnostic testing conditions, we find evidence for the opposite conclusion. That is, continuous resource models outperform all-or-none models. Together, our work demonstrates why tracking and testing auxiliary assumptions remains a fundamental challenge, even in prominent studies led by careful, computationally minded researchers. Our work also serves as a conceptual guide on how to identify and test the gamut of auxiliary assumptions in theory assessment, and we discuss these ideas in the context of contemporary approaches to scientific discovery.
理论评估中的最佳实践研究强调,检验理论具有挑战性,因为一旦理论与特定方法相联系,就会继承一系列新的假设。在本文中,我们通过展示这些挑战的广度来整合并拓展这项工作。我们表明,追踪辅助假设很困难,因为它们在理论检验的不同阶段以及理论的多个层面上被做出。我们在对一项开创性研究及其复现进行重新分析时聚焦于这些问题,这两项研究都使用了简单的工作记忆范式和主流的计算建模方法。这些研究为视觉工作记忆的“全或无”识别模型提供了主要证据,并且仍然被用作在流行的视觉工作记忆任务中如何衡量表现的基础。在我们的重新分析中,我们发现核心的实际辅助假设未经验证且被违反;原始的模型比较指标和数据在几个实验中不具有诊断性。此外,我们发现模型在“理论通用”辅助假设上不匹配,这意味着所测试的模型集受到限制,并且在理论范围上不匹配。在检验了这些辅助假设并确定了诊断性测试条件之后,我们发现了相反结论的证据。也就是说,连续资源模型优于全或无模型。总之,我们的工作表明了为什么追踪和检验辅助假设仍然是一项根本性挑战,即使在由细心且有计算思维的研究人员主导的著名研究中也是如此。我们的工作还作为一个概念指南,指导如何在理论评估中识别和检验辅助假设的全部范围,并且我们在当代科学发现方法的背景下讨论这些观点。