Assink Nick, Gonzalez-Perrino Maria P, Santana-Trejo Raul, Doornberg Job N, Hoekstra Harm, Kraeima Joep, IJpma Frank F A
Department of Trauma Surgery, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands.
3D Lab, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands.
Clin Orthop Relat Res. 2025 Mar 12. doi: 10.1097/CORR.0000000000003442.
When faced with a severe intraarticular injury like a tibial plateau fracture, patients count on surgeons to make an accurate estimation of prognosis. Unfortunately, there are few tools available that enable precise, personalized prognosis estimation tailored to each patient's unique circumstances, including their individual and fracture-specific characteristics. In this study, we developed and validated a clinical prediction model using machine-learning algorithms for the 2- and 5-year risk of TKA after tibia plateau fractures.
QUESTIONS/PURPOSES: Can machine learning-based probability calculators estimate the probability of 2- and 5-year risk of conversion to TKA in patients with a tibial plateau fracture?
A multicenter, cross-sectional study was performed in six hospitals in patients treated for a tibial plateau fracture between 2003 to 2019. In total, 2057 patients were eligible for inclusion and were sent informed consent and a questionnaire to inquire whether they underwent conversion to TKA. For 56% (1160 of 2057), status of conversion to TKA was accounted for at a minimum of 2 years, and 53% (1082 of 2057) were accounted for at a minimum of 5 years. The mean follow-up among responders was 6 ± 4 years after injury. An analysis of nonresponders found that responders were slightly older than nonresponders (53 ± 16 years versus 51 ± 17 years; p = 0.001), they were more often women (68% [788 of 1160] versus 58% [523 of 897]; p = 0.001), they were treated nonoperatively less often (30% [346 of 1160] versus 43% [387 of 897]; p = 0.001), and they had larger fracture gaps (6.4 ± 6.3 mm versus 4.2 ± 5.2 mm; p < 0.001) and step-offs (6.3 ± 5.7 mm versus 4.5 ± 4.7 mm; p < 0.001). AO Foundation/Orthopaedic Trauma Association (AO/OTA) fracture classification did not differ between nonresponders and responders (B1 11% versus 15%, B2 16% versus 19%, B3 45% versus 39%, C2 6% versus 8%, C3 22% versus 17%; p = 0.26). A total of 70% (814 of 1160) of patients were treated with open reduction and internal fixation, whereas 30% (346 of 1160) of patients were treated nonoperatively with a cast. Most fractures (80% [930 of 1160]) were AO/OTA type B fractures, and 20% (230 of 1160) were type C. Of these patients, 7% (79 of 1160) and 10% (109 of 1082) underwent conversion to a TKA at 2- and 5-year follow-up, respectively. Patient characteristics were retrieved from electronic patient records, and imaging data were shared with the initiating center from which fracture characteristics were determined. Obtained features derived from follow-up questionnaires, electronic patient records, and radiographic assessments were eligible for development of the prediction model. The first step consisted of data cleaning and included simple type formatting and standardization of numerical columns. Subsequent feature selection consisted of a review of the published evidence and expert opinion. This was followed by bivariate analysis of the identified features. The features for the models included: age, gender, BMI, AO/OTA fracture classification, fracture displacement (gap, step-off), medial proximal tibial alignment, and posterior proximal tibial alignment. The data set was used to train three models: logistic regression, random forest, and XGBoost. Logistic regression models linear relationships, random forest handles nonlinear complexities with decision trees, and XGBoost excels with sequential error correction and regularization. The models were tested using a sixfold validation approach by training the model on data from five (of six) respective medical centers and validating it against the remaining center that was left out for training. Performance was assessed by the area under the receiver operating characteristic curve (AUC), which measures a model's ability to distinguish between classes. AUC varies between 0 and 1, with values closer to 1 indicating better performance. To ensure robust and reliable results, we used bootstrapping as a resampling technique. In addition, calibration curves were plotted, and calibration was assessed with the calibration slope and intercept. The calibration plot compares the estimated probabilities with the observed probabilities for the primary outcome. Calibration slope evaluates alignment between predicted probabilities and observed outcomes (1 = perfect, < 1 = overfit, > 1 = underfit). Calibration intercept indicates bias (0 = perfect, negative = underestimation, positive = overestimation). Last, the Brier score, measuring the mean squared error of predicted probabilities (0 = perfect), was calculated.
There were no differences among the models in terms of sensitivity and specificity; the AUCs for each overlapped broadly and ranged from 0.76 to 0.83. Calibration was most optimal in logistic regression for both 2- and 5-year models, with slopes of 0.82 (random forest 0.60, XGBoost 0.26) and 0.95 (random forest 0.85, XGBoost 0.48) and intercepts of 0.01 for both (random forest 0.01 to 0.02; XGBoost 0.05 to 0.07). Brier score was similar between models varying between 0.06 to 0.09. Given that its performance metrics were highest, we chose the logistic regression algorithm as the final prediction model. The web application providing the prediction tool is freely available and can be accessed through: https://3dtrauma.shinyapps.io/tka_prediction/.
In this study, a personalized risk assessment tool was developed to support clinical decision-making and patient counseling. Our findings demonstrate that machine-learning algorithms, particularly logistic regression, can provide accurate and reliable predictions of TKA conversion at 2 and 5 years after a tibial plateau fracture. In addition, it provides a useful prognostic tool for surgeons who perform fracture surgery that can be used quickly and easily with patients in the clinic or emergency department once it complies with medical device regulations. External validation is needed to assess performance in other institutions and countries; to account for patient and surgeon preferences, resources, and cultures; and to further strengthen its clinical applicability.
Level III, therapeutic study.
面对诸如胫骨平台骨折这样严重的关节内损伤时,患者期望外科医生能准确预估预后。遗憾的是,几乎没有工具能够根据每位患者的独特情况,包括其个体特征和骨折特异性特征,进行精确的个性化预后评估。在本研究中,我们开发并验证了一种使用机器学习算法预测胫骨平台骨折后2年和5年全膝关节置换术(TKA)风险的临床预测模型。
问题/目的:基于机器学习的概率计算器能否估计胫骨平台骨折患者2年和5年转为TKA的风险概率?
在2003年至2019年间,对六家医院中接受胫骨平台骨折治疗的患者进行了一项多中心横断面研究。共有2057例患者符合纳入标准,我们向他们发送了知情同意书和一份问卷,询问他们是否接受了TKA转换手术。对于56%(2057例中的1160例)的患者,TKA转换状态的记录至少有2年,对于53%(即2057例中的1082例)的患者,记录至少有5年。应答者的平均随访时间为受伤后6±4年。对未应答者的分析发现,应答者比未应答者年龄稍大(53±16岁对51±17岁;p = 0.001),女性比例更高(68%[1160例中的788例]对58%[897例中的523例];p = 0.001),接受非手术治疗的比例更低(30%[1160例中的346例]对43%[897例中的387例];p = 0)。01),他们的骨折间隙更大(6.4±6.3毫米对4.2±5.2毫米;p < 0.001),台阶移位也更大(6.3±5.7毫米对4.5±4.7毫米;p < 0.001)。未应答者和应答者之间的AO基金会/骨科创伤协会(AO/OTA)骨折分类没有差异(B1型11%对15%,B2型16%对19%,B3型45%对39%,C2型6%对8%,C3型22%对17%;p = 0.26)。共有70%(1160例中的814例)的患者接受了切开复位内固定治疗,而30%(1160例中的346例)的患者接受了石膏非手术治疗。大多数骨折(80%[1160例中的930例])为AO/OTA B型骨折,20%(1160例中的230例)为C型骨折。在这些患者中,分别有7%(1160例中的79例)和10%(1082例中的109例)在2年和5年随访时接受了TKA转换。患者特征从电子病历中获取,影像数据与初始中心共享以确定骨折特征。从随访问卷、电子病历和影像学评估中获得的特征用于开发预测模型。第一步包括数据清理,包括简单的类型格式化和数值列的标准化。随后的特征选择包括对已发表证据的审查和专家意见。接下来是对已识别特征的双变量分析。模型的特征包括:年龄、性别、体重指数、AO/OTA骨折分类、骨折移位(间隙、台阶移位)、胫骨近端内侧对线和胫骨近端后侧对线。数据集用于训练三个模型:逻辑回归、随机森林和XGBoost。逻辑回归模型线性关系,随机森林通过决策树处理非线性复杂性,XGBoost在顺序误差校正和正则化方面表现出色。使用六重验证方法对模型进行测试,即在六个医疗中心中的五个中心的数据上训练模型,并在排除用于训练的剩余中心的数据上进行验证。通过受试者操作特征曲线(AUC)下的面积评估性能,AUC衡量模型区分不同类别的能力。AUC在0到1之间变化,值越接近1表示性能越好。为确保结果稳健可靠,我们使用自助法作为重采样技术。此外,绘制校准曲线,并使用校准斜率和截距评估校准情况。校准图比较估计概率与主要结局的观察概率。校准斜率评估预测概率与观察结果之间的一致性(1 = 完美,< 1 = 过拟合,> 1 = 欠拟合)。校准截距表示偏差(0 = 完美,负数 = 低估,正数 = 高估)。最后,计算Brier分数,它衡量预测概率的均方误差(0 = 完美)。
各模型在敏感性和特异性方面无差异;每个模型的AUC广泛重叠,范围为0.76至0.83。在2年和5年模型中,逻辑回归的校准最为理想,斜率分别为0.82(随机森林为0.60,XGBoost为0.26)和0.95(随机森林为0.85,XGBoost为0.48),截距均为0.01(随机森林为0.01至0.02;XGBoost为0.05至0.07)。模型之间的Brier分数相似,在0.06至0.09之间。鉴于其性能指标最高,我们选择逻辑回归算法作为最终预测模型。提供预测工具的网络应用程序可免费获取,可通过以下链接访问:https://3dtrauma.shinyapps.io/tka_prediction/。
在本研究中,开发了一种个性化风险评估工具以支持临床决策和患者咨询。我们的研究结果表明,机器学习算法,特别是逻辑回归,可以为胫骨平台骨折后2年和5年的TKA转换提供准确可靠的预测。此外,它为进行骨折手术的外科医生提供了一种有用的预后工具,一旦符合医疗器械法规,可在诊所或急诊科快速轻松地与患者一起使用。需要进行外部验证以评估其在其他机构和国家的性能;考虑患者和外科医生的偏好、资源和文化;并进一步加强其临床适用性。
III级,治疗性研究。