M. A. Fontana, S. Lyman, G. K. Sarker, D. E. Padgett, C. H. MacLean, Hospital for Special Surgery, Center for the Advancement of Value in Musculoskeletal Care, New York, NY, USA M. A. Fontana, S. Lyman, Weill Cornell Medical College, Department of Healthcare Policy and Research, New York, NY, USA.
Clin Orthop Relat Res. 2019 Jun;477(6):1267-1279. doi: 10.1097/CORR.0000000000000687.
Identifying patients at risk of not achieving meaningful gains in long-term postsurgical patient-reported outcome measures (PROMs) is important for improving patient monitoring and facilitating presurgical decision support. Machine learning may help automatically select and weigh many predictors to create models that maximize predictive power. However, these techniques are underused among studies of total joint arthroplasty (TJA) patients, particularly those exploring changes in postsurgical PROMs. QUESTION/PURPOSES: (1) To evaluate whether machine learning algorithms, applied to hospital registry data, could predict patients who would not achieve a minimally clinically important difference (MCID) in four PROMs 2 years after TJA; (2) to explore how predictive ability changes as more information is included in modeling; and (3) to identify which variables drive the predictive power of these models.
Data from a single, high-volume institution's TJA registry were used for this study. We identified 7239 hip and 6480 knee TJAs between 2007 and 2012, which, for at least one PROM, patients had completed both baseline and 2-year followup surveys (among 19,187 TJAs in our registry and 43,313 total TJAs). In all, 12,203 registry TJAs had valid SF-36 physical component scores (PCS) and mental component scores (MCS) at baseline and 2 years; 7085 and 6205 had valid Hip and Knee Disability and Osteoarthritis Outcome Scores for joint replacement (HOOS JR and KOOS JR scores), respectively. Supervised machine learning refers to a class of algorithms that links a mapping of inputs to an output based on many input-output examples. We trained three of the most popular such algorithms (logistic least absolute shrinkage and selection operator (LASSO), random forest, and linear support vector machine) to predict 2-year postsurgical MCIDs. We incrementally considered predictors available at four time points: (1) before the decision to have surgery, (2) before surgery, (3) before discharge, and (4) immediately after discharge. We evaluated the performance of each model using area under the receiver operating characteristic (AUROC) statistics on a validation sample composed of a random 20% subsample of TJAs excluded from modeling. We also considered abbreviated models that only used baseline PROMs and procedure as predictors (to isolate their predictive power). We further directly evaluated which variables were ranked by each model as most predictive of 2-year MCIDs.
The three machine learning algorithms performed in the poor-to-good range for predicting 2-year MCIDs, with AUROCs ranging from 0.60 to 0.89. They performed virtually identically for a given PROM and time point. AUROCs for the logistic LASSO models for predicting SF-36 PCS 2-year MCIDs at the four time points were: 0.69, 0.78, 0.78, and 0.78, respectively; for SF-36 MCS 2-year MCIDs, AUROCs were: 0.63, 0.89, 0.89, and 0.88; for HOOS JR 2-year MCIDs: 0.67, 0.78, 0.77, and 0.77; for KOOS JR 2-year MCIDs: 0.61, 0.75, 0.75, and 0.75. Before-surgery models performed in the fair-to-good range and consistently ranked the associated baseline PROM as among the most important predictors. Abbreviated LASSO models performed worse than the full before-surgery models, though they retained much of the predictive power of the full before-surgery models.
Machine learning has the potential to improve clinical decision-making and patient care by helping to prioritize resources for postsurgical monitoring and informing presurgical discussions of likely outcomes of TJA. Applied to presurgical registry data, such models can predict, with fair-to-good ability, 2-year postsurgical MCIDs. Although we report all parameters of our best-performing models, they cannot simply be applied off-the-shelf without proper testing. Our analyses indicate that machine learning holds much promise for predicting orthopaedic outcomes. LEVEL OF EVIDENCE: Level III, diagnostic study.
识别在长期术后患者报告结局测量(PROM)中无法获得有意义改善的患者对于改善患者监测和促进术前决策支持很重要。机器学习可以帮助自动选择和加权许多预测因子,以创建最大化预测能力的模型。然而,这些技术在全关节置换术(TJA)患者的研究中应用不足,特别是那些探索术后 PROM 变化的研究。问题/目的:(1)评估机器学习算法应用于医院注册数据是否可以预测 TJA 后 2 年四个 PROM 中不会达到最小临床重要差异(MCID)的患者;(2)探讨随着建模中包含的信息量增加,预测能力如何变化;(3)确定哪些变量驱动这些模型的预测能力。
本研究使用了单一、大容量机构的 TJA 注册数据。我们确定了 2007 年至 2012 年间 7239 例髋关节和 6480 例膝关节 TJA,至少有一项 PROM,患者在基线和 2 年随访时都完成了调查(我们的注册库中有 19187 例 TJA 和 43313 例总 TJA)。共有 12203 例注册库 TJA 在基线和 2 年时具有有效的 SF-36 生理成分评分(PCS)和心理成分评分(MCS);7085 例和 6205 例分别具有有效的髋关节和膝关节残疾和骨关节炎结局评分(HOOS JR 和 KOOS JR 评分)。监督机器学习是指一种根据许多输入-输出示例将输入映射到输出的算法。我们训练了三种最流行的此类算法(逻辑最小绝对收缩和选择算子(LASSO)、随机森林和线性支持向量机)来预测 2 年术后 MCID。我们逐步考虑了四个时间点可用的预测因子:(1)决定手术前,(2)手术前,(3)出院前,(4)出院后立即。我们使用验证样本中随机抽取的 20% TJAs (不包括在建模中)的接收者操作特征(ROC)曲线下面积(AUROC)统计数据评估每个模型的性能。我们还考虑了仅使用基线 PROM 和手术作为预测因子的简化模型(以隔离其预测能力)。我们进一步直接评估了每个模型将哪些变量列为最能预测 2 年 MCID 的变量。
三种机器学习算法在预测 2 年 MCID 方面表现出较差到较好的范围,AUROC 范围从 0.60 到 0.89。对于给定的 PROM 和时间点,它们的性能几乎相同。逻辑 LASSO 模型预测 SF-36 PCS 2 年 MCID 的四个时间点的 AUROCs 分别为:0.69、0.78、0.78 和 0.78;SF-36 MCS 2 年 MCIDs 的 AUROCs 分别为:0.63、0.89、0.89 和 0.88;HOOS JR 2 年 MCIDs 的 AUROCs 分别为:0.67、0.78、0.77 和 0.77;KOOS JR 2 年 MCIDs 的 AUROCs 分别为:0.61、0.75、0.75 和 0.75。手术前模型在公平到良好的范围内表现良好,并且始终将相关的基线 PROM 列为最重要的预测因子之一。简化的 LASSO 模型的表现不如完整的术前模型,但它们保留了完整术前模型的大部分预测能力。
机器学习通过帮助确定术后监测的优先级和告知 TJA 手术的预期结果,有可能改善临床决策和患者护理。将这些模型应用于术前注册数据,可以以公平到良好的能力预测 2 年术后 MCID。虽然我们报告了表现最佳的模型的所有参数,但如果没有适当的测试,它们不能简单地直接应用。我们的分析表明,机器学习在预测骨科结果方面具有很大的潜力。
三级,诊断研究。