School of Education, Stanford University, 485 Lasuen Mall, Stanford, CA 94305-3096, United States.
Soc Sci Res. 2012 Sep;41(5):1003-16. doi: 10.1016/j.ssresearch.2012.05.007. Epub 2012 May 16.
Survey researchers often administer batteries of questions to measure respondents' abilities, but these batteries are not always designed in keeping with the principles of optimal test construction. This paper illustrates one instance in which following these principles can improve a measurement tool used widely in the social and behavioral sciences: the GSS's vocabulary test called "Wordsum". This ten-item test is composed of very difficult items and very easy items, and item response theory (IRT) suggests that the omission of moderately difficult items is likely to have handicapped Wordsum's effectiveness. Analyses of data from national samples of thousands of American adults show that after adding four moderately difficult items to create a 14-item battery, "Wordsumplus" (1) outperformed the original battery in terms of quality indicators suggested by classical test theory; (2) reduced the standard error of IRT ability estimates in the middle of the latent ability dimension; and (3) exhibited higher concurrent validity. These findings show how to improve Wordsum and suggest that analysts should use a score based on all 14 items instead of using the summary score provided by the GSS, which is based on only the original 10 items. These results also show more generally how surveys measuring abilities (and other constructs) can benefit from careful application of insights from the contemporary educational testing literature.
调查研究人员经常会使用一系列问题来衡量受访者的能力,但这些问题集并非总是按照最优测试设计的原则来设计的。本文举例说明了遵循这些原则可以改进一种在社会和行为科学中广泛使用的测量工具:GSS 的词汇测试“Wordsum”。这个十项测试由非常难的项目和非常简单的项目组成,项目反应理论(IRT)表明,省略中等难度的项目可能会削弱 Wordsum 的有效性。对数千名美国成年人的全国样本数据的分析表明,在创建一个由 14 个项目组成的 14 项电池后,“Wordsumplus”(1)在经典测试理论建议的质量指标方面优于原始电池;(2)降低了潜在能力维度中间IRT 能力估计的标准误差;(3)表现出更高的同时有效性。这些发现展示了如何改进 Wordsum,并表明分析人员应该使用基于所有 14 个项目的分数,而不是使用 GSS 提供的仅基于原始 10 个项目的汇总分数。这些结果更普遍地表明,测量能力(和其他结构)的调查可以从当代教育测试文献中的见解的精心应用中受益。