• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过遵循IRT 原则提高调查中的能力测量:综合社会调查中的词汇测试。

Improving ability measurement in surveys by following the principles of IRT: The Wordsum vocabulary test in the General Social Survey.

机构信息

School of Education, Stanford University, 485 Lasuen Mall, Stanford, CA 94305-3096, United States.

出版信息

Soc Sci Res. 2012 Sep;41(5):1003-16. doi: 10.1016/j.ssresearch.2012.05.007. Epub 2012 May 16.

DOI:10.1016/j.ssresearch.2012.05.007
PMID:23017913
Abstract

Survey researchers often administer batteries of questions to measure respondents' abilities, but these batteries are not always designed in keeping with the principles of optimal test construction. This paper illustrates one instance in which following these principles can improve a measurement tool used widely in the social and behavioral sciences: the GSS's vocabulary test called "Wordsum". This ten-item test is composed of very difficult items and very easy items, and item response theory (IRT) suggests that the omission of moderately difficult items is likely to have handicapped Wordsum's effectiveness. Analyses of data from national samples of thousands of American adults show that after adding four moderately difficult items to create a 14-item battery, "Wordsumplus" (1) outperformed the original battery in terms of quality indicators suggested by classical test theory; (2) reduced the standard error of IRT ability estimates in the middle of the latent ability dimension; and (3) exhibited higher concurrent validity. These findings show how to improve Wordsum and suggest that analysts should use a score based on all 14 items instead of using the summary score provided by the GSS, which is based on only the original 10 items. These results also show more generally how surveys measuring abilities (and other constructs) can benefit from careful application of insights from the contemporary educational testing literature.

摘要

调查研究人员经常会使用一系列问题来衡量受访者的能力,但这些问题集并非总是按照最优测试设计的原则来设计的。本文举例说明了遵循这些原则可以改进一种在社会和行为科学中广泛使用的测量工具:GSS 的词汇测试“Wordsum”。这个十项测试由非常难的项目和非常简单的项目组成,项目反应理论(IRT)表明,省略中等难度的项目可能会削弱 Wordsum 的有效性。对数千名美国成年人的全国样本数据的分析表明,在创建一个由 14 个项目组成的 14 项电池后,“Wordsumplus”(1)在经典测试理论建议的质量指标方面优于原始电池;(2)降低了潜在能力维度中间IRT 能力估计的标准误差;(3)表现出更高的同时有效性。这些发现展示了如何改进 Wordsum,并表明分析人员应该使用基于所有 14 个项目的分数,而不是使用 GSS 提供的仅基于原始 10 个项目的汇总分数。这些结果更普遍地表明,测量能力(和其他结构)的调查可以从当代教育测试文献中的见解的精心应用中受益。

相似文献

1
Improving ability measurement in surveys by following the principles of IRT: The Wordsum vocabulary test in the General Social Survey.通过遵循IRT 原则提高调查中的能力测量:综合社会调查中的词汇测试。
Soc Sci Res. 2012 Sep;41(5):1003-16. doi: 10.1016/j.ssresearch.2012.05.007. Epub 2012 May 16.
2
The measurement of patients' expectations for health care: a review and psychometric testing of a measure of patients' expectations.患者对医疗保健期望的测量:对患者期望测量的综述和心理测量学测试。
Health Technol Assess. 2012 Jul;16(30):i-xii, 1-509. doi: 10.3310/hta16300.
3
Health-related quality of life in early breast cancer.早期乳腺癌患者的健康相关生活质量
Dan Med Bull. 2010 Sep;57(9):B4184.
4
Improving the quality of the NCQA (National Committee for Quality Assurance) Annual Member Health Care Survey Version 1.0.提高美国国家质量保证委员会(NCQA)年度会员医疗保健调查1.0版的质量。
Am J Manag Care. 1997 May;3(5):719-30.
5
Practical issues in the application of item response theory: a demonstration using items from the pediatric quality of life inventory (PedsQL) 4.0 generic core scales.项目反应理论应用中的实际问题:使用儿童生活质量量表(PedsQL)4.0通用核心量表项目的示范
Med Care. 2007 May;45(5 Suppl 1):S39-47. doi: 10.1097/01.mlr.0000259879.05499.eb.
6
The effects of violating standard item writing principles on tests and students: the consequences of using flawed test items on achievement examinations in medical education.违反标准试题编写原则对考试及学生的影响:医学教育中使用有缺陷的试题对成绩考试的后果。
Adv Health Sci Educ Theory Pract. 2005;10(2):133-43. doi: 10.1007/s10459-004-4019-5.
7
Classical test theory and item response theory analyses of multi-item scales assessing parents' perceptions of their children's dental care.对评估父母对其子女牙齿护理认知的多项目量表进行经典测试理论和项目反应理论分析。
Med Care. 2006 Nov;44(11 Suppl 3):S60-8. doi: 10.1097/01.mlr.0000245144.90229.d0.
8
[French version of TASTE (test for the ability and evaluation)].[TASTE(能力与评估测试)的法语版本]
Encephale. 2001 Nov-Dec;27(6):527-38.
9
A primer on classical test theory and item response theory for assessments in medical education.医学教育评估中的经典测量理论和项目反应理论简介。
Med Educ. 2010 Jan;44(1):109-17. doi: 10.1111/j.1365-2923.2009.03425.x.
10
[A neuro-psychological test (T-K-W test) for dementia based on working memory theory and item-response theory: its development and construction].基于工作记忆理论和项目反应理论的痴呆症神经心理学测试(T-K-W测试):其开发与构建
Seishin Shinkeigaku Zasshi. 2002;104(8):690-709.

引用本文的文献

1
Developing a novel measure of non-rigid, ductile spatial skill.开发一种新的非刚性、韧性空间技能测量方法。
Cogn Res Princ Implic. 2025 Mar 26;10(1):13. doi: 10.1186/s41235-025-00621-w.
2
Do Religiosity and Spirituality Differ in Their Relationship with Crystallized Intelligence? Evidence from the General Social Survey.宗教信仰与精神性在与晶体智力的关系上存在差异吗?来自综合社会调查的证据。
J Intell. 2024 Jul 7;12(7):65. doi: 10.3390/jintelligence12070065.
3
How is GPS used? Understanding navigation system use and its relation to spatial ability.
GPS 是如何使用的?了解导航系统的使用及其与空间能力的关系。
Cogn Res Princ Implic. 2024 Mar 19;9(1):16. doi: 10.1186/s41235-024-00545-x.
4
Visualizing Cross-Sections of 3D Objects: Developing Efficient Measures Using Item Response Theory.可视化3D物体的横截面:运用项目反应理论开发有效测量方法。
J Intell. 2023 Oct 28;11(11):205. doi: 10.3390/jintelligence11110205.
5
Development and Validation of an Ability Measure of Emotion Understanding: The Core Relational Themes of Emotion (CORE) Test.情绪理解能力测量工具的开发与验证:情绪核心关系主题(CORE)测试
J Intell. 2023 Oct 9;11(10):195. doi: 10.3390/jintelligence11100195.
6
Neurocognition after motor vehicle collision and adverse post-traumatic neuropsychiatric sequelae within 8 weeks: Initial findings from the AURORA study.机动车事故后神经认知功能障碍和 8 周内不良创伤后神经精神后遗症:AURORA 研究的初步发现。
J Affect Disord. 2022 Feb 1;298(Pt B):57-67. doi: 10.1016/j.jad.2021.10.104. Epub 2021 Nov 17.
7
Do Smarter People Have More Conservative Economic Attitudes? Assessing the Relationship Between Cognitive Ability and Economic Ideology.聪明人是否具有更保守的经济态度?评估认知能力与经济观念之间的关系。
Pers Soc Psychol Bull. 2022 Nov;48(11):1548-1565. doi: 10.1177/01461672211046808. Epub 2021 Sep 22.
8
By their words ye shall know them: Evidence of genetic selection against general intelligence and concurrent environmental enrichment in vocabulary usage since the mid 19th century.凭其言辞,汝将识之:自19世纪中叶以来,针对一般智力的基因选择以及词汇使用中同步环境富集的证据。
Front Psychol. 2015 Apr 21;6:361. doi: 10.3389/fpsyg.2015.00361. eCollection 2015.
9
Reading ability and print exposure: item response theory analysis of the author recognition test.阅读能力与印刷品接触:作者识别测试的项目反应理论分析
Behav Res Methods. 2015 Dec;47(4):1095-1109. doi: 10.3758/s13428-014-0534-3.