Herrick Ariane L, Roberts Christopher, Tracey Andrew, Silman Alan, Anderson Marina, Goodfield Mark, McHugh Neil, Muir Lindsay, Denton Christopher P
University of Manchester, Manchester, UK, and Salford Royal National Health Service Foundation Trust, Salford, UK.
Arthritis Rheum. 2009 Mar;60(3):878-82. doi: 10.1002/art.24333.
To test the intra- and interobserver variability, among clinicians with an interest in systemic sclerosis (SSc), in defining digital ulcers.
Thirty-five images of finger lesions, incorporating a wide range of abnormalities at different sites, were duplicated, yielding a data set of 70 images. Physicians with an interest in SSc were invited to take part in the Web-based study, which involved looking through the images in a random sequence. The sequence differed for individual participants and prevented cross-checking with previous images. Participants were asked to grade each image as depicting "ulcer" or "no ulcer," and if "ulcer," then either "inactive" or "active." Images of a range of exemplar lesions were available for reference purposes while participants viewed the test images. Intrarater reliability was assessed using a weighted kappa coefficient with quadratic weights. Interrater reliability was estimated using a multirater weighted kappa coefficient.
Fifty individuals (most of them rheumatologists) from 15 countries participated in the study. There was a high level of intrarater reliability, with a mean weighted kappa value of 0.81 (95% confidence interval [95% CI] 0.77, 0.84). Interrater reliability was poorer (weighted kappa = 0.46 [95% CI 0.35, 0.57]).
The poor interrater reliability suggests that if digital ulceration is to be used as an end point in multicenter clinical trials of SSc, then strict definitions must be developed. The present investigation also demonstrates the feasibility of Web-based studies, for which large numbers of participants can be recruited over a short time frame.
测试对系统性硬化症(SSc)感兴趣的临床医生在定义指端溃疡方面的观察者内和观察者间变异性。
复制了35张手指病变图像,这些图像包含不同部位的多种异常情况,从而产生了一个包含70张图像的数据集。邀请对SSc感兴趣的医生参与这项基于网络的研究,该研究要求他们以随机顺序查看这些图像。每个参与者的图像顺序不同,以防止与之前的图像进行交叉核对。参与者被要求将每张图像评定为“溃疡”或“无溃疡”,如果是“溃疡”,则进一步评定为“非活动”或“活动”。在参与者查看测试图像时,提供了一系列典型病变的图像以供参考。使用具有二次权重的加权kappa系数评估观察者内信度。使用多观察者加权kappa系数估计观察者间信度。
来自15个国家的50名个体(大多数是风湿病学家)参与了该研究。观察者内信度较高,平均加权kappa值为0.81(95%置信区间[95%CI]0.77,0.84)。观察者间信度较差(加权kappa = 0.46[95%CI 0.35,0.57])。
观察者间信度较差表明,如果指端溃疡要用作SSc多中心临床试验的终点,则必须制定严格的定义。本研究还证明了基于网络的研究的可行性,通过这种方式可以在短时间内招募大量参与者。