A. El-Galaly, A. Kappel, P. T. Nielsen, S. L. Jensen, Orthopedic Research Unit, Aalborg University Hospital, Aalborg, Denmark.
A. El-Galaly, A. Kappel, P. T. Nielsen, S. L. Jensen, Department of Clinical Medicine, Aalborg University, Aalborg, Denmark.
Clin Orthop Relat Res. 2020 Sep;478(9):2088-2101. doi: 10.1097/CORR.0000000000001343.
Revision TKA is a serious adverse event with substantial consequences for the patient. As the demand for TKA rises, reducing the risk of revision TKA is becoming increasingly important. Predictive tools based on machine-learning algorithms could reform clinical practice. Few attempts have been made to combine machine-learning algorithms with data from nationwide arthroplasty registries and, to the authors' knowledge, none have tried to predict the likelihood of early revision TKA.
QUESTION/PURPOSES: We used the Danish Knee Arthroplasty Registry to build models to predict the likelihood of revision TKA within 2 years of primary TKA and asked: (1) Which preoperative factors were the most important features behind these models' predictions of revision? (2) Can a clinically meaningful model be built on the preoperative factors included in the Danish Knee Arthroplasty Registry?
The Danish Knee Arthroplasty Registry collects patients' characteristics and surgical information from all arthroplasties conducted in Denmark and thus provides a large nationwide cohort of patients undergoing TKA. As training dataset, we retrieved all preoperative variables of 25,104 primary TKAs from 2012 to 2015. The same variables were retrieved from 6170 TKAs conducted in 2016, which were used as a hold-out year for temporal external validation. If a patient received bilateral TKA, only the first knee to receive surgery was included. All patients were followed for 2 years, with removal, exchange, or addition of an implant defined as TKA revision. We created four different predictive models to find the best performing model, including a regression-based model using logistic regression with least shrinkage and selection operator (LASSO), two classification tree models (random forest and gradient boosting model) and a supervised neural network. For comparison, we created a noninformative model predicting that all observations were unrevised. The four machine learning models were trained using 10-fold cross-validation on the training dataset after adjusting for the low percentage of revisions by over-sampling revised observations and undersampling unrevised observations. In the validation dataset, the models' performance was evaluated and compared by density plot, calibration plot, accuracy, Brier score, receiver operator characteristic (ROC) curve and area under the curve (AUC). The density plot depicts the distribution of probabilities and the calibration plot graphically depicts whether the predicted probability resembled the observed probability. The accuracy indicates how often the models' predictions were correct and the Brier score is the mean distance from the predicted probability to the observed outcome. The ROC curve is a graphical output of the models' sensitivity and specificity from which the AUC is calculated. The AUC can be interpreted as the likelihood that a model correctly classified an observation and thus, a priori, an AUC of 0.7 was chosen as threshold for a clinically meaningful model.
Based the model training, age, postfracture osteoarthritis and weight were deemed as important preoperative factors within the machine learning models. During validation, the models' performance was not different from the noninformative models, and with AUCs ranging from 0.57 to 0.60, no models reached the predetermined AUC threshold for a clinical useful discriminative capacity.
Although several well-known presurgical risk factors for revision were coupled with four different machine learning methods, we could not develop a clinically useful model capable of predicting early TKA revisions in the Danish Knee Arthroplasty Registry based on preoperative data.
The inability to predict early TKA revision highlights that predicting revision based on preoperative information alone is difficult. Future models might benefit from including medical comorbidities and an anonymous surgeon identifier variable or may attempt to build a postoperative predictive model including intra- and postoperative factors as these may have a stronger association with early TKA revisions.
翻修全膝关节置换术是一种严重的不良事件,会给患者带来重大后果。随着全膝关节置换术需求的增加,降低翻修全膝关节置换术的风险变得越来越重要。基于机器学习算法的预测工具可能会改变临床实践。虽然已经有一些尝试将机器学习算法与全国关节置换登记处的数据结合起来,但据作者所知,还没有人试图预测早期翻修全膝关节置换术的可能性。
问题/目的:我们使用丹麦膝关节置换登记处的数据来建立模型,以预测初次全膝关节置换术后 2 年内翻修的可能性,并提出以下问题:(1)在这些模型对翻修的预测中,哪些术前因素是最重要的特征?(2)能否在丹麦膝关节置换登记处包含的术前因素的基础上建立一个有临床意义的模型?
丹麦膝关节置换登记处收集了丹麦所有关节置换手术的患者特征和手术信息,因此提供了一个接受全膝关节置换术的大型全国性队列。作为训练数据集,我们从 2012 年至 2015 年检索了 25104 例初次全膝关节置换术的所有术前变量。2016 年进行的 6170 例全膝关节置换术的相同变量也被检索出来,作为时间外部验证的保留年份。如果患者接受双侧全膝关节置换术,只包括第一膝关节接受手术。所有患者均随访 2 年,移除、更换或添加植入物定义为全膝关节置换术翻修。我们创建了四个不同的预测模型来找到表现最佳的模型,包括使用逻辑回归和最小收缩和选择算子(LASSO)的回归模型、两个分类树模型(随机森林和梯度提升模型)和一个监督神经网络。为了进行比较,我们创建了一个非信息模型,预测所有观察结果都没有进行翻修。四个机器学习模型在调整了低翻修百分比后,使用 10 折交叉验证在训练数据集上进行训练,对翻修观察结果进行过采样,对未翻修观察结果进行欠采样。在验证数据集中,通过密度图、校准图、准确性、Brier 评分、接收器工作特征(ROC)曲线和曲线下面积(AUC)来评估和比较模型的性能。密度图描绘了概率的分布,校准图图形地描绘了预测的概率是否接近观察到的概率。准确性表示模型的预测有多少次是正确的,Brier 评分是预测概率与观察结果之间的平均距离。ROC 曲线是模型从灵敏度和特异性中得到的图形输出,从中计算出 AUC。AUC 可以解释为模型正确分类观察结果的可能性,因此,预先选择 0.7 的 AUC 作为有临床意义的模型的阈值。
基于模型训练,年龄、骨折后骨关节炎和体重被认为是机器学习模型中的重要术前因素。在验证过程中,模型的性能与非信息模型没有区别,AUC 范围从 0.57 到 0.60,没有一个模型达到预定的 AUC 阈值,以获得有临床意义的区分能力。
尽管将几个众所周知的翻修术前风险因素与四种不同的机器学习方法相结合,但我们无法根据术前数据开发出一种基于丹麦膝关节置换登记处的有临床实用价值的早期全膝关节置换术翻修预测模型。
无法预测早期全膝关节置换术翻修突出表明,仅基于术前信息预测翻修是困难的。未来的模型可能受益于包括医疗合并症和匿名外科医生标识符变量,或者可能试图建立一个包括术中及术后因素的术后预测模型,因为这些因素可能与早期全膝关节置换术翻修有更强的关联。