Streiner David L
Baycrest Centre for Geriatric Care, Department of Psychiatry, University of Toronto, Ontario, Canada.
J Pers Assess. 2003 Jun;80(3):217-22. doi: 10.1207/S15327752JPA8003_01.
One of the central tenets of classical test theory is that scales should have a high degree of internal consistency, as evidenced by Cronbach's a, the mean interitem correlation, and a strong first component. However, there are many instances in which this rule does not apply. Following Bollen and Lennox (1991), I differentiate between questionnaires such as anxiety or depression inventories, which are composed of items that are manifestations of an underlying hypothetical construct (i.e., where the items are called effect indicators) and those such as Scale 6 of the Minnesota Multiphasic Personality Inventory (Hathaway & McKinley, 1943) and ones used to tap quality of life or activities of daily living in which the items or subscales themselves define the construct (these items are called causal indicators). Questionnaires of the first sort, which are referred to as scales in this article, meet the criteria of classical test theory, whereas the second type, which are called indexes here, do not. I discuss the implications of this difference for how items are selected, the relationship among the items, and the statistics that should and should not be used in establishing the reliability of the scale or index.
经典测试理论的核心原则之一是量表应具有高度的内部一致性,克朗巴哈α系数、平均项目间相关性以及强大的第一成分都证明了这一点。然而,在许多情况下这条规则并不适用。根据博伦和伦诺克斯(1991年)的观点,我区分了两类问卷,一类如焦虑或抑郁量表,其由作为潜在假设结构表现形式的项目组成(即这些项目被称为效果指标),另一类如明尼苏达多相人格调查表的量表6(哈撒韦和麦金利,1943年)以及用于衡量生活质量或日常生活活动的问卷,其中项目或子量表本身定义了该结构(这些项目被称为因果指标)。本文中第一类问卷被称为量表,符合经典测试理论的标准,而第二类问卷在此处被称为指数,则不符合。我讨论了这种差异对项目选择方式、项目之间的关系以及在确定量表或指数的可靠性时应该和不应该使用的统计方法的影响。