一项临床评估活动的评分者间信度和内部一致性。

The inter-rater reliability and internal consistency of a clinical evaluation exercise.

作者信息

Kroboth F J, Hanusa B H, Parker S, Coulehan J L, Kapoor W N, Brown F H, Karpf M, Levey G S

机构信息

Division of General Internal Medicine, University of Pittsburgh, Pennsylvania 15261.

出版信息

J Gen Intern Med. 1992 Mar-Apr;7(2):174-9. doi: 10.1007/BF02598008.

DOI:10.1007/BF02598008

PMID:1487766

Abstract

OBJECTIVE

To assess the internal consistency and inter-rater reliability of a clinical evaluation exercise (CEX) format that was designed to be easily utilized, but sufficiently detailed, to achieve uniform recording of the observed examination.

DESIGN

A comparison of 128 CEXs conducted for 32 internal medicine interns by full-time faculty. This paper reports alpha coefficients as measures of internal consistency and several measures of inter-rater reliability.

SETTING

A university internal medicine program. Observations were conducted at the end of the internship year.

PARTICIPANTS

Participants were 32 interns and observers were 12 full-time faculty in the department of medicine. The entire intern group was chosen in order to optimize the spectrum of abilities represented. Patients used for the study were recruited by the chief resident from the inpatient medical service based on their ability and willingness to participate.

INTERVENTION

Each intern was observed twice and there were two examiners during each CEX. The examiners were given a standardized preparation and used a format developed over five years of previous pilot studies.

MEASUREMENTS AND MAIN RESULTS

The format appeared to have excellent internal consistency; alpha coefficients ranged from 0.79 to 0.99. However, multiple methods of determining inter-rater reliability yielded similar results; intraclass correlations ranged from 0.23 to 0.50 and generalizability coefficients from a low of 0.00 for the overall rating of the CEX to a high of 0.61 for the physical examination section. Transforming scores to eliminate rater effects and dichotomizing results into pass-fail did not appear to enhance the reliability results.

CONCLUSIONS

Although the CEX is a valuable didactic tool, its psychometric properties preclude reliable assessment of clinical skills as a one-time observation.

摘要

目的

评估一种临床评估练习（CEX）形式的内部一致性和评分者间信度，该形式旨在易于使用，但细节充分，以实现对观察到的检查进行统一记录。

设计

对全职教员为32名内科实习医生进行的128次CEX进行比较。本文报告了作为内部一致性度量的阿尔法系数和几种评分者间信度度量。

地点

一所大学的内科项目。观察在实习年结束时进行。

参与者

参与者为32名实习医生，观察者为医学系的12名全职教员。选择整个实习医生群体是为了优化所代表的能力范围。用于该研究的患者由总住院医师根据其参与的能力和意愿从住院医疗服务中招募。

干预

每位实习医生被观察两次，每次CEX有两名考官。考官接受了标准化培训，并使用了经过五年前期试点研究开发的形式。

测量与主要结果

该形式似乎具有出色的内部一致性；阿尔法系数范围为0.79至0.99。然而，多种确定评分者间信度的方法得出了相似的结果；组内相关性范围为0.23至0.50，可推广性系数从CEX总体评分的低0.00到体格检查部分的高0.61。转换分数以消除评分者效应并将结果二分化为通过/失败似乎并未提高信度结果。

结论

尽管CEX是一种有价值的教学工具，但其心理测量特性排除了将其作为一次性观察对临床技能进行可靠评估的可能性。

相似文献

The inter-rater reliability and internal consistency of a clinical evaluation exercise.一项临床评估活动的评分者间信度和内部一致性。

J Gen Intern Med. 1992 Mar-Apr;7(2):174-9. doi: 10.1007/BF02598008.

Inter-rater reliability and generalizability of patient note scores using a scoring rubric based on the USMLE Step-2 CS format.使用基于美国医师执照考试第二步临床技能考试（USMLE Step-2 CS）格式的评分标准时，评分者间信度及患者记录分数的可推广性。

Adv Health Sci Educ Theory Pract. 2016 Oct;21(4):761-73. doi: 10.1007/s10459-015-9664-3. Epub 2016 Jan 12.

The mini-CEX (clinical evaluation exercise): a preliminary investigation.迷你临床评估练习（Mini-CEX）：一项初步调查。

Ann Intern Med. 1995 Nov 15;123(10):795-9. doi: 10.7326/0003-4819-123-10-199511150-00008.

Real-time inter-rater reliability of the Council of Emergency Medicine residency directors standardized direct observation assessment tool.急诊医学住院医师主任理事会标准化直接观察评估工具的实时评估者间可靠性。

Acad Emerg Med. 2009 Dec;16 Suppl 2:S51-7. doi: 10.1111/j.1553-2712.2009.00593.x.

Assessing the Validity of a Multidisciplinary Mini-Clinical Evaluation Exercise.评估多学科迷你临床评估练习的有效性。

Teach Learn Med. 2018 Apr-Jun;30(2):152-161. doi: 10.1080/10401334.2017.1387553. Epub 2017 Dec 14.

Consistency, inter-rater reliability, and validity of 441 consecutive mock oral examinations in anesthesiology: implications for use as a tool for assessment of residents.麻醉学中441次连续模拟口试的一致性、评分者间信度和效度：作为住院医师评估工具的应用意义

Anesthesiology. 1999 Jul;91(1):288-98. doi: 10.1097/00000542-199907000-00037.

Effect of rater training on reliability and accuracy of mini-CEX scores: a randomized, controlled trial.评估者培训对迷你临床评估练习（mini-CEX）评分可靠性和准确性的影响：一项随机对照试验。

J Gen Intern Med. 2009 Jan;24(1):74-9. doi: 10.1007/s11606-008-0842-3. Epub 2008 Nov 11.

Training less-experienced faculty improves reliability of skills assessment in cardiac surgery.培训经验不足的教员可提高心脏外科手术技能评估的可靠性。

J Thorac Cardiovasc Surg. 2014 Dec;148(6):2491-6.e1-2. doi: 10.1016/j.jtcvs.2014.09.017. Epub 2014 Sep 16.

Using cloud-based mobile technology for assessment of competencies among medical students.利用基于云的移动技术评估医学生的能力。

PeerJ. 2013 Sep 17;1:e164. doi: 10.7717/peerj.164. eCollection 2013.

Does scale length matter? A comparison of nine- versus five-point rating scales for the mini-CEX.量表长度是否重要？迷你临床演练评估的九点量表与五点量表的比较。

Adv Health Sci Educ Theory Pract. 2009 Dec;14(5):655-64. doi: 10.1007/s10459-008-9147-x. Epub 2008 Nov 26.

引用本文的文献

Introducing a Comprehensive Framework for Competency-based Procedure Training.引入基于能力的程序培训综合框架。

J Gen Intern Med. 2025 Jul 8. doi: 10.1007/s11606-025-09677-2.

Exploring the perception of medical students and lecturers on the consequential validity of medical long case.探索医学生和讲师对医学长病例结果效度的认知。

BMC Med Educ. 2025 Apr 22;25(1):588. doi: 10.1186/s12909-025-07055-4.

Initial Development of an Automated Platform for Assessing Trainee Performance on Case Presentations.用于评估学员病例汇报表现的自动化平台的初步开发。

ATS Sch. 2022 Sep 23;3(4):548-560. doi: 10.34197/ats-scholar.2022-0010OC. eCollection 2022 Dec.

Current Trends in Mini-Clinical Evaluation Exercise in Medical Education: A Bibliometric Analysis.医学教育中迷你临床评估练习的当前趋势：一项文献计量分析。

Cureus. 2022 Dec 30;14(12):e33121. doi: 10.7759/cureus.33121. eCollection 2022 Dec.

Perception and Satisfaction of Undergraduate Medical Students of the Mini Clinical Evaluation Exercise Implementation in Orthopedic Outpatient Setting.本科医学生对骨科门诊开展迷你临床评估演练的认知与满意度

Adv Med Educ Pract. 2022 Sep 23;13:1159-1170. doi: 10.2147/AMEP.S375693. eCollection 2022.

Rater Training in Medical Education: A Scoping Review.医学教育中的评分者培训：一项范围综述

Cureus. 2020 Nov 6;12(11):e11363. doi: 10.7759/cureus.11363.

A Virtual Counseling Application Using Artificial Intelligence for Communication Skills Training in Nursing Education: Development Study.一种用于护理教育中沟通技能训练的人工智能虚拟咨询应用程序：开发研究。

J Med Internet Res. 2019 Oct 29;21(10):e14658. doi: 10.2196/14658.

Effect of rater training on the reliability of technical skill assessments: a randomized controlled trial.评分者培训对技术技能评估可靠性的影响：一项随机对照试验。

Can J Surg. 2018 Oct 1;61(6):15917. doi: 10.1503/cjs.015917.

Reliability of rubrics in the assessment of orthodontic oral presentation.正畸口腔展示评估中评分标准的可靠性

Saudi Dent J. 2017 Oct;29(4):135-139. doi: 10.1016/j.sdentj.2017.07.001. Epub 2017 Aug 2.

Assessing Communication Skills in Real Medical Encounters in Oncology: Development and Validation of the ComOn-Coaching Rating Scales.评估肿瘤学实际医疗问诊中的沟通技巧：ComOn-Coaching评分量表的开发与验证

J Cancer Educ. 2019 Feb;34(1):73-81. doi: 10.1007/s13187-017-1269-5.

本文引用的文献

Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit.加权kappa系数：用于衡量名义尺度上的一致性，并考虑了尺度不一致或部分得分的情况。

Psychol Bull. 1968 Oct;70(4):213-20. doi: 10.1037/h0026256.

Intraclass correlations: uses in assessing rater reliability.组内相关系数：在评估评分者可靠性中的应用。

Psychol Bull. 1979 Mar;86(2):420-8. doi: 10.1037//0033-2909.86.2.420.

THE PROGRAMMED PATIENT: A TECHNIQUE FOR APPRAISING STUDENT PERFORMANCE IN CLINICAL NEUROLOGY.程序化患者：一种评估学生临床神经学表现的技术。

J Med Educ. 1964 Aug;39:802-5.

A survey of clinical skills evaluation practices in internal medicine residency programs.内科住院医师培训项目临床技能评估实践调查。

J Med Educ. 1984 May;59(5):401-6. doi: 10.1097/00001888-198405000-00006.

Clinical competence certification: a critical appraisal.临床能力认证：一项批判性评估。

J Med Educ. 1984 Oct;59(10):799-805.

Direct observation as a means of teaching and evaluating clinical skills.直接观察作为一种教学和评估临床技能的手段。

J Med Educ. 1966 Feb;41(2):150-61. doi: 10.1097/00001888-196602000-00006.

The new procedure for evaluating the clinical competence of candidates to be certified by the American Board of Internal Medicine.美国内科医学委员会评估拟获认证候选人临床能力的新程序。

Ann Intern Med. 1972 Mar;76(3):491-6. doi: 10.7326/0003-4819-76-3-491.

General practitioners and psychosocial problems: An evaluation using pseudopatients.全科医生与心理社会问题：一项使用假患者的评估

Med J Aust. 1974 Sep 14;2(11):393-8. doi: 10.5694/j.1326-5377.1974.tb70862.x.

Utilization of simulated patients to teach the routine pelvic examination.利用模拟患者来教授常规盆腔检查。

J Med Educ. 1974 Dec;49(12):1174-8. doi: 10.1097/00001888-197412000-00009.

A comparative trial of the clinical evaluation exercise.临床评估练习的比较试验。

Arch Intern Med. 1985 Jun;145(6):1121-3.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一项临床评估活动的评分者间信度和内部一致性。

The inter-rater reliability and internal consistency of a clinical evaluation exercise.

作者信息

机构信息

出版信息

OBJECTIVE

DESIGN

SETTING

PARTICIPANTS

INTERVENTION

MEASUREMENTS AND MAIN RESULTS

CONCLUSIONS

目的

设计

地点

参与者

干预

测量与主要结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献