Fraser R C, McKinley R K, Mulholland H
University of Leicester.
Br J Gen Pract. 1994 Jul;44(384):293-6.
An acceptable assessment must be both valid and reliable; the face validity of the Leicester assessment package has already been established.
This study set out to test the reliability of the Leicester assessment package, and the factors influencing it, when used by multiple assessors to assess performance in general practice consultations.
Six randomly selected course organizer assessors simultaneously used the package to conduct independent assessments of the performance of five doctors of widely varying abilities in consultation with six simulated patients. The scores allocated were subjected to generalizability analysis.
The mean scores allocated for consultation performance of individual doctors ranged from 51% to 70%, with the lower scores being allocated to the less experienced doctors. Scores of each assessor across the cases were examined for internal consistency and five of the six assessors consistently scored the doctors with an alpha coefficient of the minimum accepted level of 0.80 or greater. The other assessor had a consistency of only 0.22. Measurements of consistency within cases between markers indicated that the first case produced unreliable results (alpha coefficient 0.25) but all other cases were scored consistently. Two independent assessors scoring eight consultations are the requisite numbers to achieve acceptable levels of reliability in a formal assessment process; seven consultations produce the minimum acceptable generalizability coefficient of 0.80 plus the first 'non-counting' consultation.
Required levels of reliability can be achieved when the package is used by multiple markers assessing the same consultations over a wide range of consultation performance. To achieve reliability only two hours of assessment time are required using the Leicester package compared with the previously regarded minimum of 32 hours. Although assessors can produce reliable scores with minimal training, intra-assessor reliability cannot be taken for granted and all assessors should be trained and calibrated before being sanctioned to conduct assessments, particularly for regulatory purposes. The Leicester assessment package has now been shown to be valid, reliable, feasible and easy to use in practice. It can, therefore, be recommended for use in both formative and summative assessment of consultation competence in general practice.
一项可接受的评估必须兼具效度和信度;莱斯特评估工具的表面效度已经得到确立。
本研究旨在检验莱斯特评估工具在多名评估者用于评估全科医疗会诊表现时的信度及其影响因素。
随机挑选六名课程组织者评估者,同时使用该工具对五名能力差异很大的医生与六名模拟患者会诊的表现进行独立评估。对给出的分数进行概化分析。
个别医生会诊表现的平均得分在51%至70%之间,得分较低的是经验较少的医生。检查了每位评估者在所有病例中的得分的内部一致性,六名评估者中有五名对医生的评分始终保持在可接受的最低水平即阿尔法系数为0.80或更高。另一名评估者的一致性仅为0.22。各病例中评分者之间的一致性测量表明,第一个病例产生的结果不可靠(阿尔法系数为0.25),但所有其他病例的评分是一致的。在正式评估过程中,两名独立评估者对八次会诊进行评分是达到可接受信度水平所需的数量;七次会诊产生的最低可接受概化系数为0.80,再加上第一次“不计入”的会诊。
当该工具由多名评估者用于评估广泛的会诊表现范围内的相同会诊时,可以达到所需的信度水平。使用莱斯特工具只需两小时的评估时间即可实现信度,而之前认为的最短时间是32小时。尽管评估者只需经过最少的培训就能给出可靠的分数,但评估者内部的信度不能想当然,所有评估者在被批准进行评估之前,特别是出于监管目的时,都应该接受培训和校准。莱斯特评估工具现已证明在实践中有效、可靠、可行且易于使用。因此,推荐将其用于全科医疗会诊能力的形成性和总结性评估。