M. Belt, K. Smulders, P. Heesterbeek, Research Department, Sint Maartenskliniek, Nijmegen, the Netherlands.
A. van Houten, A. Wymenga, G. van Hellemondt, Department of Orthopedics, Sint Maartenskliniek, Nijmegen, the Netherlands.
Clin Orthop Relat Res. 2020 Sep;478(9):2057-2064. doi: 10.1097/CORR.0000000000001084.
Accurate quantification of bone loss facilitates preoperative planning and standardization for research purposes in patients who undergo revision TKA. The most commonly used classification to rate bone defects in this setting, the Anderson Orthopaedic Research Institute classification, does not quantify diaphyseal bone loss and reliability has not been well studied.
QUESTIONS/PURPOSES: We developed a new classification scheme to rate bone defects in patients undergoing revision TKA and tested (1) the intraobserver and interobserver reliability of this classification for revision TKA based on preoperative radiographs, and (2) whether additional CT images might improve interobserver reliability.
This was a preregistered observational study. Interobserver reliability was analyzed using preoperative radiographs of 61 patients who underwent (repeat) revision TKA, and their bone defects were rated by five experienced orthopaedic surgeons. For intraobserver reliability, ratings were repeated at least 2 weeks after the first rating (Timepoints 1 and 2). Directly after the radiographic assessments of Timepoint 2, the observers were provided with CT images of each patient and asked to rate the bone defects for a third time (Timepoint 3), to assess the additional value of CT. Intraobserver and interobserver reliability were tested using Gwet's agreement coefficient 2, which is a measure of agreement between observers in categorical data. Substantial agreement was defined as coefficients between 0.61 to 0.8 and almost perfect agreement as > 0.8.
The intraobserver reliability varied between 0.55 (95% CI 0.40 to 0.71) and 0.87 (95% CI 0.78 to 0.96) in the epiphysis, between 0.69 (95% CI 0.58 to 0.80) and 0.98 (95% CI 0.95 to 1) in the metaphysis, and between 0.95 (95% CI 0.90 to 0.99) and 0.99 (95% CI 0.98 to 1) in the diaphysis. The interobserver reliability varied between 0.48 (95% CI 0.39 to 0.57) and 0.49 (95% CI 0.42 to 0.56) in the epiphysis and between 0.81 (95% CI 0.75 to 0.87) and 0.88 (95% CI 0.83 to 0.93) in the metaphysis, and was 0.96 (95% CI 0.93 to 0.99) in the diaphysis at Timepoint 1. The interobserver reliability at Timepoint 2 was similar to that of Timepoint 1. The addition of CT images did not improve reliability (Timepoint 3).
The bone defect classification was less reliable in the epiphyseal area compared with the metaphysis and diaphysis. This finding may be explained by prosthetic components obscuring this region or the more severe bone defects in this region. The addition of CT scans did not improve reliability. Further testing of reliability with observers from other institutions is necessary, as well as validity testing, by testing the classification in relation to intraoperative findings.
Level III, diagnostic study.
在接受翻修 TKA 的患者中,准确评估骨丢失有助于术前规划和研究目的的标准化。最常用于评估此类骨缺损的分类方法是安德森骨科研究所分类,但它没有量化骨干骨丢失,且可靠性尚未得到很好的研究。
问题/目的:我们开发了一种新的分类方案来评估接受翻修 TKA 的患者的骨缺损,并测试(1)该分类方案在基于术前 X 线片的翻修 TKA 中的观察者内和观察者间可靠性,以及(2)是否额外的 CT 图像可能会提高观察者间的可靠性。
这是一项预先注册的观察性研究。5 名经验丰富的骨科医生使用 61 名接受(重复)翻修 TKA 患者的术前 X 线片对骨缺损进行评估,以分析观察者间的可靠性。为了评估观察者内的可靠性,至少在第一次评估后 2 周(时间点 1 和 2)进行了重复评估。在时间点 2 的放射学评估后直接为观察者提供每位患者的 CT 图像,并要求他们进行第三次骨缺损评分(时间点 3),以评估 CT 的附加价值。使用 Gwet 的协议系数 2 测试观察者内和观察者间的可靠性,这是一种用于评估分类数据中观察者之间一致性的指标。适度的一致性定义为 0.61 到 0.8 之间的系数,几乎完美的一致性定义为 > 0.8。
在骺端,观察者内的可靠性在 0.55(95%CI 0.40 到 0.71)和 0.87(95%CI 0.78 到 0.96)之间变化,在骨干端,在 0.69(95%CI 0.58 到 0.80)和 0.98(95%CI 0.95 到 1)之间变化,在骨干端,在 0.95(95%CI 0.90 到 0.99)和 0.99(95%CI 0.98 到 1)之间变化。观察者间的可靠性在骺端为 0.48(95%CI 0.39 到 0.57)和 0.49(95%CI 0.42 到 0.56)之间变化,在骨干端为 0.81(95%CI 0.75 到 0.87)和 0.88(95%CI 0.83 到 0.93)之间变化,在骨干端为 0.96(95%CI 0.93 到 0.99)。时间点 2 的观察者间可靠性与时间点 1 相似。增加 CT 图像并没有提高可靠性(时间点 3)。
与骨干和骨干相比,骺端的骨缺损分类的可靠性较差。这种发现可能是由于假体组件遮挡了该区域,或者该区域的骨缺损更严重。增加 CT 扫描并没有提高可靠性。需要其他机构的观察者进一步测试可靠性,以及通过将分类与术中发现相关联来测试有效性。
III 级,诊断研究。