Stanley I M, Webster C A, Webster J
J R Coll Gen Pract. 1985 Aug;35(277):375-80.
Factors governing the appropriateness, reliability and validity of rating scales in the measurement of professional performance are reviewed. The origin and preliminary testing among undergraduated and general practitioners of a brief consultation rating schedule is described.Statistical criteria are proposed for the analysis of ratings, by groups, in the comparison of consultation performance. Using these criteria the capacity of the 10 rating schedule items to discriminate between two contrasting consultations was examined. Each of the items was used at some time by students or doctors to express significant preference for the same consultation; and on this basis all the items are considered to merit inclusion. One item showed highly significant intra- and inter-observer reliability.The schedule is reproduced in full, together with a data-collection document and significance chart, with the aim of encouraging groups of doctors to test the validity of the items in the comparison of other pairs of consultations. It is proposed that future versions of the schedule should reflect the experience of such groups in testing existing items and in defining additional items which satisfy the proposed criteria.
本文综述了影响专业表现评估量表在测量中的适用性、可靠性和有效性的因素。描述了一个简短会诊评估表在本科生和全科医生中的起源及初步测试。提出了用于分析会诊表现比较中分组评分的统计标准。运用这些标准,检验了10个评估表项目区分两种对比会诊的能力。学生或医生在某些时候使用每个项目来表达对同一会诊的显著偏好;基于此,所有项目都被认为值得纳入。有一个项目显示出高度显著的观察者内和观察者间可靠性。完整再现了该评估表,以及一份数据收集文件和显著性图表,目的是鼓励医生团队在比较其他会诊对时测试这些项目的有效性。建议该评估表的未来版本应反映此类团队在测试现有项目和定义满足提议标准的其他项目方面的经验。