Department of Rheumatology, University Hospital at Glostrup, Copenhagen, Denmark.
Rheumatology (Oxford). 2012 Nov;51(11):2034-8. doi: 10.1093/rheumatology/kes124. Epub 2012 Jul 30.
To evaluate the reliability and agreement of semi-quantitative scoring (SQS) and quantitative scoring (QS) systems. To compare the two types of scoring system and investigate the construct validity for both scoring systems.
A total of 46 RA patients (median disease duration of 6.5 years) were enrolled in the study. They were investigated with colour Doppler ultrasound using the central position of the wrist. Disease activity score based on 28 joints (DAS-28) was determined for all patients using CRP. Two participants trained in the SQS system and two in the QS system evaluated the 46 anonymized images. All images were scored twice by each of the two assessors in order to assess both intra- and inter-reader reliability.
The reliability for the two systems were 0.964 for the QS, and 0.817 for the SQS, with a comparable inter-reader agreement for both scoring systems; 95% limits of agreement for the QS being between -7.7% and +6.7% on the colour fraction scale (0-100%), whereas SQS was between -0.8 and +0.8 on the ordinal scale from 0 to 3. There was a direct but non-linear relationship between the two modalities (Spearman's r = 0.73) and critical conceptual issues in the agreement between the scoring systems were revealed. The construct validity was poor for both systems with only a weak correlation to CRP.
High reliability and good agreement of both scoring systems were found when applied to the same patient cohort. Different scoring systems appear to be highly correlated.
评估半定量评分(SQS)和定量评分(QS)系统的可靠性和一致性。比较两种评分系统,并研究两种评分系统的结构效度。
本研究共纳入 46 例 RA 患者(中位病程 6.5 年)。使用彩色多普勒超声检查腕关节中央位置。所有患者均采用 CRP 测定 28 关节疾病活动评分(DAS-28)。两名经过 SQS 系统培训的参与者和两名经过 QS 系统培训的参与者评估了 46 张匿名图像。为了评估内部和外部读者的可靠性,每个评估者对所有图像进行了两次评分。
QS 的可靠性为 0.964,SQS 的可靠性为 0.817,两种评分系统的内部读者一致性相当;QS 的色彩分数范围(0-100%)的 95%一致性界限为-7.7%至+6.7%,而 SQS 的等级范围为 0 至 3 的界限为-0.8 至+0.8。两种模式之间存在直接但非线性的关系(Spearman r=0.73),并且揭示了评分系统之间一致性的关键概念问题。两种系统的结构效度均较差,与 CRP 仅呈弱相关。
当应用于同一患者队列时,两种评分系统均具有较高的可靠性和良好的一致性。不同的评分系统似乎高度相关。