Department of Psychology, Arizona State University, PO Box 871104, Tempe, AZ, 85287, USA.
Psychometrika. 2024 Dec;89(4):1148-1169. doi: 10.1007/s11336-024-09988-z. Epub 2024 Jul 20.
This paper reflects on some practical implications of the excellent treatment of sum scoring and classical test theory (CTT) by Sijtsma et al. (Psychometrika 89(1):84-117, 2024). I have no major disagreements about the content they present and found it to be an informative clarification of the properties and possible extensions of CTT. In this paper, I focus on whether sum scores-despite their mathematical justification-are positioned to improve psychometric practice in empirical studies in psychology, education, and adjacent areas. First, I summarize recent reviews of psychometric practice in empirical studies, subsequent calls for greater psychometric transparency and validity, and how sum scores may or may not be positioned to adhere to such calls. Second, I consider limitations of sum scores for prediction, especially in the presence of common features like ordinal or Likert response scales, multidimensional constructs, and moderated or heterogeneous associations. Third, I review previous research outlining potential limitations of using sum scores as outcomes in subsequent analyses where rank ordering is not always sufficient to successfully characterize group differences or change over time. Fourth, I cover potential challenges for providing validity evidence for whether sum scores represent a single construct, particularly if one wishes to maintain minimal CTT assumptions. I conclude with thoughts about whether sum scores-even if mathematically justified-are positioned to improve psychometric practice in empirical studies.
本文反思了 Sijtsma 等人(Psychometrika 89(1):84-117, 2024)对评分总和和经典测试理论(CTT)的出色处理所带来的一些实际影响。我对他们提出的内容没有重大分歧,认为这是对 CTT 的性质和可能扩展的有益澄清。在本文中,我主要关注的是,尽管评分总和在数学上是合理的,但它们是否能够改善心理学、教育及相关领域实证研究中的心理测量实践。首先,我总结了近期对实证研究中心理测量实践的综述,随后呼吁提高心理测量的透明度和有效性,以及评分总和是否能够满足这些呼吁。其次,我考虑了评分总和在预测方面的局限性,尤其是在存在常见特征(如有序或李克特反应量表、多维结构以及调节或异质关联)的情况下。第三,我回顾了之前的研究,概述了在后续分析中使用评分总和作为结果的潜在局限性,在这些分析中,排序并不总是足以成功描述组间差异或随时间的变化。第四,我讨论了为评分总和是否代表单一结构提供有效性证据所面临的潜在挑战,特别是如果希望维持最小的 CTT 假设。最后,我思考了评分总和是否能够改善实证研究中的心理测量实践,即使它们在数学上是合理的。