评估同行评审。一种评分工具的预测试。

Evaluating peer reviews. Pilot testing of a grading instrument.

作者信息

Feurer I D, Becker G J, Picus D, Ramirez E, Darcy M D, Hicks M E

机构信息

Journal of Vascular and Interventional Radiology, Nashville, TN.

出版信息

JAMA. 1994 Jul 13;272(2):98-100. doi: 10.1001/jama.272.2.98.

DOI:10.1001/jama.272.2.98

PMID:8015141

Abstract

OBJECTIVE

To measure the reliability and preliminary validity of a grading instrument for editors to evaluate the quality of peer reviews.

DESIGN

The consecutive sample design included 53 reviews of 23 manuscripts. Reviews were systematically assigned to interrater reliability (n = 41; power greater than 0.90 to detect a difference of greater than one point) and preliminary criterion-related validity (n = 12) subsamples. Content validity was closely examined.

SETTING

Nonclinical.

PARTICIPANTS

Three graders evaluated reliability. One individual examined content validity and two editors tested preliminary criterion-related validity. INTERVENTION (INSTRUMENT)--Attributes reflecting two basic dimensions, review content and format, were identified and scored (values are possible points/percent contribution): timeliness, 3/21%; grade sheet, 1/7%; etiquette, 1/7%; sectional narratives, 3/21%; citations, 2/14%; narrative summary, 2/14%; and insights, 2/14%. A scoring guide was provided.

MAIN OUTCOME MEASURES

Statistical analyses used to test the interrater reliability of the total score included the intraclass correlation coefficient and analysis of variance with the expectation to uphold the null hypothesis. Kendall's coefficient of concordance was used to test preliminary criterion-related validity.

RESULTS

The intraclass correlation coefficient was .84 (P < .001) and a lack of difference between mean scores was demonstrated by analysis of variance (P = .46). Content validity was confirmed and preliminary criterion-related validity was indicated (Kendall's coefficient of concordance = .94, P = .038).

CONCLUSIONS

The instrument is reliable. Content validation has been completed, and further criterion-related validation is warranted.

摘要

目的

评估一种供编辑用于评价同行评议质量的分级工具的信度和初步效度。

设计

连续抽样设计纳入了对23篇手稿的53份评议。评议被系统地分配到组内相关系数（n = 41；检测大于1分差异的效能大于0.90）和初步的效标关联效度（n = 12）子样本。对内容效度进行了仔细检查。

设置

非临床环境。

参与者

三名评分者评估信度。一人检查内容效度，两名编辑测试初步的效标关联效度。干预（工具）——确定并对反映两个基本维度（评议内容和格式）的属性进行评分（分值为可能的得分/百分比贡献）：及时性，3/21%；评分表，1/7%；礼仪，1/7%；各部分叙述，3/21%；引用，2/14%；叙述性总结，2/14%；以及见解，2/14%。提供了一份评分指南。