• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

什么影响分数转换的质量?使用部分计分模型进行真分数等值时的潜在问题。

What Affects the Quality of Score Transformations? Potential Issues in True-Score Equating Using the Partial Credit Model.

作者信息

Fellinghauer Carolina, Debelak Rudolf, Strobl Carolin

机构信息

University of Zurich, Switzerland.

出版信息

Educ Psychol Meas. 2023 Dec;83(6):1249-1290. doi: 10.1177/00131644221143051. Epub 2023 Jan 13.

DOI:10.1177/00131644221143051
PMID:37970488
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10638984/
Abstract

This simulation study investigated to what extent departures from construct similarity as well as differences in the difficulty and targeting of scales impact the score transformation when scales are equated by means of concurrent calibration using the partial credit model with a common person design. Practical implications of the simulation results are discussed with a focus on scale equating in health-related research settings. The study simulated data for two scales, varying the number of items and the sample sizes. The factor correlation between scales was used to operationalize construct similarity. Targeting of the scales was operationalized through increasing departure from equal difficulty and by varying the dispersion of the item and person parameters in each scale. The results show that low similarity between scales goes along with lower transformation precision. In cases with equal levels of similarity, precision improves in settings where the range of the item parameters is encompassing the person parameters range. With decreasing similarity, score transformation precision benefits more from good targeting. Difficulty shifts up to two logits somewhat increased the estimation bias but without affecting the transformation precision. The observed robustness against difficulty shifts supports the advantage of applying a true-score equating methods over identity equating, which was used as a naive baseline method for comparison. Finally, larger sample size did not improve the transformation precision in this study, longer scales improved only marginally the quality of the equating. The insights from the simulation study are used in a real-data example.

摘要

本模拟研究调查了在使用具有共同人员设计的部分计分模型通过同时校准来对量表进行等值化时,与构念相似性的偏离程度以及量表难度和目标定位的差异对分数转换的影响程度。讨论了模拟结果的实际意义,重点是健康相关研究环境中的量表等值化。该研究模拟了两个量表的数据,改变了项目数量和样本量。量表之间的因子相关性用于衡量构念相似性。通过增加与等难度的偏离程度以及改变每个量表中项目参数和人员参数的离散程度来衡量量表的目标定位。结果表明,量表之间的相似性较低会导致转换精度较低。在相似性水平相同的情况下,当项目参数范围涵盖人员参数范围时,精度会提高。随着相似性的降低,分数转换精度从良好的目标定位中受益更多。难度偏移高达两个对数单位会稍微增加估计偏差,但不会影响转换精度。观察到的对难度偏移的稳健性支持了应用真分数等值化方法优于恒等等值化的优势,恒等等值化被用作天真的基线方法进行比较。最后,在本研究中,较大的样本量并没有提高转换精度,较长的量表仅略微提高了等值化的质量。模拟研究的见解被用于一个实际数据示例中。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f2f/10638984/b068212c13e0/10.1177_00131644221143051-fig12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f2f/10638984/eaf588f3ebfc/10.1177_00131644221143051-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f2f/10638984/48f502f90fdb/10.1177_00131644221143051-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f2f/10638984/8efec448b775/10.1177_00131644221143051-fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f2f/10638984/15d31cff1ba6/10.1177_00131644221143051-fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f2f/10638984/9032b2b98526/10.1177_00131644221143051-fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f2f/10638984/9ddbdab80c39/10.1177_00131644221143051-fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f2f/10638984/d007a14e5573/10.1177_00131644221143051-fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f2f/10638984/7a5c61aff93c/10.1177_00131644221143051-fig8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f2f/10638984/5ee62f306d0c/10.1177_00131644221143051-fig9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f2f/10638984/525c7f3688dc/10.1177_00131644221143051-fig10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f2f/10638984/86714a3a720b/10.1177_00131644221143051-fig11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f2f/10638984/b068212c13e0/10.1177_00131644221143051-fig12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f2f/10638984/eaf588f3ebfc/10.1177_00131644221143051-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f2f/10638984/48f502f90fdb/10.1177_00131644221143051-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f2f/10638984/8efec448b775/10.1177_00131644221143051-fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f2f/10638984/15d31cff1ba6/10.1177_00131644221143051-fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f2f/10638984/9032b2b98526/10.1177_00131644221143051-fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f2f/10638984/9ddbdab80c39/10.1177_00131644221143051-fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f2f/10638984/d007a14e5573/10.1177_00131644221143051-fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f2f/10638984/7a5c61aff93c/10.1177_00131644221143051-fig8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f2f/10638984/5ee62f306d0c/10.1177_00131644221143051-fig9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f2f/10638984/525c7f3688dc/10.1177_00131644221143051-fig10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f2f/10638984/86714a3a720b/10.1177_00131644221143051-fig11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f2f/10638984/b068212c13e0/10.1177_00131644221143051-fig12.jpg

相似文献

1
What Affects the Quality of Score Transformations? Potential Issues in True-Score Equating Using the Partial Credit Model.什么影响分数转换的质量?使用部分计分模型进行真分数等值时的潜在问题。
Educ Psychol Meas. 2023 Dec;83(6):1249-1290. doi: 10.1177/00131644221143051. Epub 2023 Jan 13.
2
Evaluating Different Equating Setups in the Continuous Item Pool Calibration for Computerized Adaptive Testing.评估计算机自适应测试连续项目池校准中的不同等值设置
Front Psychol. 2019 Jun 6;10:1277. doi: 10.3389/fpsyg.2019.01277. eCollection 2019.
3
Test equating sleep scales: applying the Leunbach's model.测试睡眠量表等价性:应用 Leunbach 模型。
BMC Med Res Methodol. 2019 Jul 8;19(1):141. doi: 10.1186/s12874-019-0768-y.
4
Comparing concurrent versus fixed parameter equating with common items: using the dichotomous and partial credit models in a mixed-item format test.比较共同题目的并发参数等值与固定参数等值:在混合题型测试中使用二分法模型和部分计分模型
J Appl Meas. 2007;8(1):84-96.
5
Item Response Theory True Score Equating for the Bifactor Model Under the Common-Item Nonequivalent Groups Design.共同项目非等组设计下双因素模型的项目反应理论真分数等值
Appl Psychol Meas. 2022 Sep;46(6):479-493. doi: 10.1177/01466216221108995. Epub 2022 Jun 17.
6
A Comparison of IRT Observed Score Kernel Equating and Several Equating Methods.IRT观测分数核等值法与几种等值方法的比较
Front Psychol. 2020 Mar 6;11:308. doi: 10.3389/fpsyg.2020.00308. eCollection 2020.
7
Test equating of the Medical Licensing Examination in 2003 and 2004 based on the item response theory.基于项目反应理论的2003年和2004年医学执照考试的测验等值
J Educ Eval Health Prof. 2006;3:2. doi: 10.3352/jeehp.2006.3.2. Epub 2006 Jul 8.
8
Asymptotic Standard Errors of Generalized Partial Credit Model True Score Equating Using Characteristic Curve Methods.使用特征曲线法的广义部分计分模型真分数等值的渐近标准误差
Appl Psychol Meas. 2021 Jul;45(5):331-345. doi: 10.1177/01466216211013101. Epub 2021 May 12.
9
Comparison of proficiency in an anesthesiology course across distinct medical student cohorts: psychometric approaches to test equating.不同医学学生群体在麻醉学课程中的熟练程度比较:用于测试等值性的心理测量方法。
J Chin Med Assoc. 2014 Mar;77(3):150-4. doi: 10.1016/j.jcma.2013.10.011. Epub 2013 Nov 28.
10
Rasch Versus Classical Equating in the Context of Small Sample Sizes.小样本量情况下Rasch模型与经典等值法的比较
Educ Psychol Meas. 2020 Jun;80(3):499-521. doi: 10.1177/0013164419878483. Epub 2019 Sep 30.

引用本文的文献

1
To scan or not to scan? Comparing the effectiveness and cost differential of insoles manufactured from foam-box casts versus direct scans in treating musculoskeletal conditions of the foot and ankle: a double-blinded, randomised controlled trial.扫描还是不扫描?比较泡沫箱模型制作的鞋垫与直接扫描制作的鞋垫在治疗足踝部肌肉骨骼疾病方面的有效性和成本差异:一项双盲随机对照试验。
BMC Musculoskelet Disord. 2025 Mar 22;26(1):282. doi: 10.1186/s12891-025-08513-2.
2
Overview of Available Functioning Data in Switzerland: Supporting the Use of Functioning as a Health Indicator Alongside Mortality and Morbidity.瑞士现有功能数据概述:支持将功能用作死亡率和发病率以外的健康指标。
Int J Public Health. 2024 Aug 14;69:1607366. doi: 10.3389/ijph.2024.1607366. eCollection 2024.

本文引用的文献

1
A Rasch-Based Comparison of the Functional Independence Measure and Spinal Cord Independence Measure for Outcome and Quality in the Rehabilitation of Persons with Spinal Cord Injury.基于 Rasch 分析的功能独立性测量与脊髓独立性测量在脊髓损伤康复中的结局和质量比较。
J Rehabil Med. 2022 Feb 14;54:jrm00262. doi: 10.2340/jrm.v54.82.
2
Generating comprehensive functioning and disability data worldwide: development process, data analyses strategy and reliability of the WHO and World Bank Model Disability Survey.在全球范围内生成全面的功能与残疾数据:世界卫生组织和世界银行残疾调查模型的开发过程、数据分析策略及可靠性
Arch Public Health. 2022 Jan 4;80(1):6. doi: 10.1186/s13690-021-00769-z.
3
Development and Validation of Crosswalks Between FIM® and SCIM III for Voluntary Musculoskeletal Movement Functions.
跨FIM®与SCIM III 量表用于自愿性肌肉骨骼运动功能的转换研究的建立与验证。
Neurorehabil Neural Repair. 2021 Oct;35(10):880-889. doi: 10.1177/15459683211033854. Epub 2021 Jul 30.
4
Crosswalking the Patient-Reported Outcomes Measurement Information System Physical Function, Pain Interference, and Pain Intensity Scores to the Roland-Morris Disability Questionnaire and the Oswestry Disability Index.将患者报告的结局测量信息系统的身体功能、疼痛干扰和疼痛强度评分与 Roland-Morris 残疾问卷和 Oswestry 残疾指数进行交叉核对。
Arch Phys Med Rehabil. 2021 Jul;102(7):1317-1323. doi: 10.1016/j.apmr.2021.02.014. Epub 2021 Mar 5.
5
The Development of a Crosswalk for Functional Measures in Postacute Medicare Claims.基于 Medicare 后急性期索赔数据的功能测量转换工具的研发
Phys Ther. 2020 Sep 28;100(10):1862-1871. doi: 10.1093/ptj/pzaa117.
6
Evaluating Robust Scale Transformation Methods With Multiple Outlying Common Items Under IRT True Score Equating.在IRT真分数等值下使用多个异常共同项目评估稳健量表转换方法。
Appl Psychol Meas. 2020 Jun;44(4):296-310. doi: 10.1177/0146621619886050. Epub 2019 Nov 15.
7
Rasch Analysis of Postconcussive Symptoms: Development of Crosswalks and the Brain Injury Symptom Scale.脑震荡后症状的 Rasch 分析:交叉路口和脑损伤症状量表的制定。
Arch Phys Med Rehabil. 2019 Oct;100(10):1844-1852. doi: 10.1016/j.apmr.2019.04.013. Epub 2019 May 24.
8
Exploring Rubric-Related Multidimensionality in Polytomously Scored Test Items.探索多值计分测试项目中与评分标准相关的多维性
Appl Psychol Meas. 2017 May;41(3):163-177. doi: 10.1177/0146621616677715. Epub 2016 Nov 24.
9
Alternative Linear Item Response Theory Observed-Score Equating Methods.另类线性项目反应理论观测分数等值方法。
Appl Psychol Meas. 2016 May;40(3):180-199. doi: 10.1177/0146621615605089. Epub 2015 Sep 23.
10
Rasch analysis of the hospital anxiety and depression scale among Chinese cataract patients.中国白内障患者医院焦虑抑郁量表的拉施分析
PLoS One. 2017 Sep 26;12(9):e0185287. doi: 10.1371/journal.pone.0185287. eCollection 2017.