Ward W T, Vogt M, Grudziak J S, Tümer Y, Cook P C, Fitch R D
University of Pittsburgh, Pennsylvania, USA.
J Bone Joint Surg Am. 1997 May;79(5):656-63. doi: 10.2106/00004623-199705000-00004.
The Severin classification system frequently is used to evaluate the radiographic results of operations performed for the treatment of congenital dislocation of the hip. However, the reliability of this classification scheme has not been established, to our knowledge. Ideally, a classification system should be validated before it is used to promote therapeutic guidelines or to compare results of treatment; the purpose of the present study was to establish the intraobserver and interobserver reliability of the Severin classification system. Four blinded raters and the operating surgeon independently used the Severin system to evaluate the most recent radiographs of thirty-seven children (fifty-six hips) who had been managed, an average of nine years previously, with a medial open reduction for congenital dislocation of the hip. Three of the raters evaluated the same radiographs again under similar testing circumstances eight weeks later. Ten paired interobserver and three intraobserver comparisons then were analyzed with use of the Cohen kappa coefficient (kappa). The average kappa coefficient for the six pairwise comparisons between the four blinded raters was 0.15 (range, -0.05 to 0.42) when all Severin classes were analyzed independently. The average kappa coefficient for the four pairwise comparisons between the blinded raters and the operating surgeon was even lower (0.02). The kappa coefficients for the three intraobserver comparisons were 0.20, 0.38, and 0.44 (average, 0.34). Kappa analysis demonstrated variable and low levels of agreement when the Severin system was used to rate the results of operations performed for the treatment of congenital dislocation of the hip. We believe that the unadjusted kappa coefficient should indicate excellent agreement (kappa > 0.75) for all comparisons if this system is to be used for the evaluation of clinical results. The unacceptably low levels of intraobserver and interobserver reliability call into question the clinical conclusions of reports in which the Severin system has been used as the basis of proof.
Severin分类系统经常被用于评估为治疗先天性髋关节脱位而进行手术的影像学结果。然而,据我们所知,该分类方案的可靠性尚未得到证实。理想情况下,一个分类系统在用于推广治疗指南或比较治疗结果之前应该经过验证;本研究的目的是确定Severin分类系统的观察者内和观察者间可靠性。四名不知情的评估者和主刀医生独立使用Severin系统评估37名儿童(56髋)的最新X线片,这些儿童平均在9年前接受了先天性髋关节脱位的内侧切开复位术。其中三名评估者在8周后,在类似的测试条件下再次评估相同的X线片。然后使用Cohen kappa系数(kappa)对10对观察者间和3对观察者内的比较进行分析。当独立分析所有Severin分级时,四名不知情评估者之间六对比较的平均kappa系数为0.15(范围为-0.05至0.42)。不知情评估者与主刀医生之间四对比较的平均kappa系数甚至更低(0.02)。三名观察者内比较的kappa系数分别为0.20、0.38和0.44(平均为0.34)。当使用Severin系统对先天性髋关节脱位手术治疗结果进行评分时,kappa分析显示一致性水平参差不齐且较低。我们认为,如果该系统要用于评估临床结果,未经调整的kappa系数对于所有比较都应表明具有高度一致性(kappa>0.75)。观察者内和观察者间可靠性低得令人无法接受,这使得那些以Severin系统作为证据基础的报告的临床结论受到质疑。