Smith S W, Meyer R A, Connor P M, Smith S E, Hanley E N
Department of Orthopaedic Surgery, Carolinas Medical Center, Charlotte, North Carolina 28232, USA.
J Bone Joint Surg Am. 1996 Nov;78(11):1702-6. doi: 10.2106/00004623-199611000-00010.
Anteroposterior and lateral plain radiographs of 116 osteonecrotic femoral heads were reviewed to assess the interobserver reliability and intraobserver reproducibility of the modified Ficat classification system. The radiographs were reviewed initially and then again six months later by three adult reconstructive surgeons, two general orthopaedic surgeons, two orthopaedic residents, and one musculoskeletal radiologist. All eight observers agreed on the classification of twenty hips (17 per cent) at both the first and the second review of the radiographs. Paired comparisons revealed a mean interobserver kappa reliability coefficient of 0.46 (range, 0.30 to 0.67) for the first review and 0.45 (range, 0.30 to 0.66) for the second. For all observers, the mean rate of perfect agreement between the first and the second review was 68 per cent (range, 56 to 80 per cent). The mean kappa value for intraobserver reproducibility was 0.59 (range, 0.44 [one of the residents] to 0.73 [one of the general orthopaedic surgeons]). No observer or pair of observers had excellent reproducibility or reliability (kappa > 0.75). The poor interobserver reliability and fair intraobserver reproducibility diminishes any meaningful comparison of studies in which the modified Ficat classification system has been used and illuminates the need for a more reliable and reproducible classification system.
回顾了116个股骨头坏死的前后位和侧位X线平片,以评估改良Ficat分类系统在观察者间的可靠性和观察者内的可重复性。这些X线平片首先由三位成人重建外科医生、两位普通骨科医生、两位骨科住院医师和一位肌肉骨骼放射科医生进行评估,然后在六个月后再次评估。在两次X线平片评估中,所有八位观察者对20个髋关节(17%)的分类达成了一致。配对比较显示,第一次评估时观察者间kappa可靠性系数的平均值为0.46(范围为0.30至0.67),第二次评估时为0.45(范围为0.30至0.66)。对于所有观察者,第一次和第二次评估之间完全一致的平均比率为68%(范围为56%至80%)。观察者内可重复性的平均kappa值为0.59(范围为0.44[一位住院医师]至0.73[一位普通骨科医生])。没有观察者或一对观察者具有出色的可重复性或可靠性(kappa>0.75)。观察者间可靠性差和观察者内可重复性一般,削弱了使用改良Ficat分类系统的研究之间进行任何有意义比较的能力,并表明需要一个更可靠和可重复的分类系统。