Landewé Robert B M, Hermann Kay-Geert A, van der Heijde Désirée M F M, Baraliakos Xenophon, Jurik Anne-Grethe, Lambert Robert G, Østergaard Mikkel, Rudwaleit Martin, Salonen David C, Braun Jürgen
Department of Internal Medicine, Division of Rheumatology, University Hospital Maastricht, University Maastricht, 6202 AZ Maastricht, The Netherlands.
J Rheumatol. 2005 Oct;32(10):2050-5.
Magnetic resonance imaging (MRI) of the sacroiliac (SI) joints and the spine is increasingly important in the assessment of inflammatory activity and structural damage in clinical trials with patients with ankylosing spondylitis (AS). We investigated inter-reader reliability and sensitivity to change of several scoring systems to assess disease activity and change in disease activity in patients with AS. Twenty sets of consecutive MRI, derived from a randomized clinical trial comparing an active drug with placebo and selected on the basis of the presence of activity at baseline, were presented electronically to 7 experienced readers from different countries (Europe, Canada). Readers scored the MRI by 3 different methods including: a global score (grading activity per SI joint); a more comprehensive global score (grading activity per SI joint per quadrant); and a detailed scoring system [Spondyloarthritis Research Consortium of Canada (SPARCC) scoring system], which scores 6 images, divided into quadrants, with additional scores for "depth" and "intensity." A fourth and a fifth scoring system were constructed afterwards. The fourth method included the SPARCC score minus the additional scores for "depth" and "intensity," and the fifth method included the SPARCC slice with the maximum score. Inter-reader reliability was investigated by calculating intraclass correlation coefficients (ICC) for all readers together and for all possible reader pairs. Sensitivity to change was investigated by calculating standardized response means (SRM) on change scores that were made positive. Overall inter-reader ICC per method were between 0.47 and 0.58 for scoring status, and between 0.40 and 0.53 for scoring change. ICC per possible reader pairs showed much more fluctuation per method, with lowest observed values close to zero (no agreement) and highest observed values over 0.80 (excellent agreement). In general, agreement of status scores was somewhat better than agreement of change scores, and agreement of the comprehensive SPARCC scoring system was somewhat better than agreement of the more condensed systems. Sensitivity to change differed per reader, but in general was somewhat better for the comprehensive SPARCC system. This experiment under "real life," far from optimal conditions demonstrates the feasibility of scoring exercises for method comparison, provides evidence for the reliability and sensitivity to change of scoring systems to be used in assessing activity of SI joints in clinical trials, and sets the conditions for further validation research in this field.
在强直性脊柱炎(AS)患者的临床试验中,骶髂(SI)关节和脊柱的磁共振成像(MRI)对于评估炎症活动和结构损伤愈发重要。我们研究了多位阅片者之间的可靠性以及几种评分系统对AS患者疾病活动度评估和疾病活动度变化的敏感性。从一项比较活性药物与安慰剂的随机临床试验中选取了20组连续的MRI图像,这些图像是根据基线时的活动情况挑选出来的,并以电子方式呈现给来自不同国家(欧洲、加拿大)的7位经验丰富的阅片者。阅片者通过3种不同方法对MRI进行评分,包括:一个整体评分(对每个SI关节的活动度进行分级);一个更全面的整体评分(对每个SI关节的每个象限的活动度进行分级);以及一个详细评分系统[加拿大脊柱关节炎研究联盟(SPARCC)评分系统],该系统对6张图像进行评分,图像被划分为象限,并对“深度”和“强度”给出额外评分。之后构建了第四种和第五种评分系统。第四种方法包括SPARCC评分减去“深度”和“强度”的额外评分,第五种方法包括得分最高的SPARCC切片。通过计算所有阅片者整体以及所有可能的阅片者对之间的组内相关系数(ICC)来研究阅片者之间的可靠性。通过对变为正值的变化分数计算标准化反应均值(SRM)来研究对变化的敏感性。每种方法的整体阅片者间ICC在评估状态时为0.47至0.58,在评估变化时为0.40至0.53。每个可能的阅片者对之间的ICC在每种方法中显示出更大的波动,观察到的最低值接近零(无一致性),最高值超过0.80(极好的一致性)。一般来说,状态评分的一致性略优于变化评分的一致性,综合SPARCC评分系统的一致性略优于更精简系统的一致性。对变化的敏感性因阅片者而异,但总体而言,综合SPARCC系统的敏感性略好。在“现实生活”、远非最佳条件下进行的这项实验证明了进行方法比较评分练习的可行性,为评分系统在临床试验中评估SI关节活动度时的可靠性和对变化的敏感性提供了证据,并为该领域的进一步验证研究设定了条件。