Langenberger Benedikt, Schrednitzki Daniel, Halder Andreas, Busse Reinhard, Pross Christoph
Department of Health Care Management, Technische Universität Berlin, Berlin, Germany.
Chair of Digital Health, Economics & Policy, Hasso-Plattner-Institute, Potsdam, Germany.
BMC Med Inform Decis Mak. 2025 Mar 3;25(1):106. doi: 10.1186/s12911-025-02927-7.
Duration of surgery (DOS) varies substantially for patients with hip and knee arthroplasty (HA/KA) and is a major risk factor for adverse events. We therefore aimed (1) to identify whether machine learning can predict DOS in HA/KA patients using retrospective data available before surgery with reasonable performance, (2) to compare whether machine learning is able to outperform multivariable regression in predictive performance and (3) to identify the most important predictor variables for DOS both in a multi- and single-hospital context.
eXtreme Gradient Boosting (XGBoost) and multivariable linear regression were used for predictions. Both models were applied to both the whole dataset which included multiple hospitals (3,704 patients), and a single-hospital dataset (1,815 patients) of the hospital with the highest case-volumes of our sample. Data was split into training (75%) and test data (25%) for both datasets. Models were trained using 5-fold cross-validation (CV) on the training datasets and applied to test data for performance comparison.
On test data in the multi-hospital setting, the mean absolute error (MAE) was 12.13 min (HA) / 13.61 min (KA) for XGBoost. In the single-hospital analysis, performance on test data was MAE 10.87 min (HA) / MAE 12.53 min (KA) for XGBoost. Predictive ability of XGBoost was tended to be better than of regression in all setting, however not statistically significantly. Important predictors for XGBoost were physician experience, age, body mass index, patient reported outcome measures and, for the multi-hospital analysis, the hospital.
Machine learning can predict DOS in both a multi-hospital and single-hospital setting with reasonable performance. Performance between regression and machine learning differed slightly, however insignificantly, while larger datasets may improve predictive performance. The study found that hospital indicators matter in the multi-hospital setting despite controlling for various variables, highlighting potential quality differences between hospitals.
The study was registered at the German Clinical Trials Register (DRKS) under DRKS00019916.
髋关节和膝关节置换术(HA/KA)患者的手术时长(DOS)差异很大,并且是不良事件的主要风险因素。因此,我们旨在:(1)确定机器学习能否使用手术前可用的回顾性数据以合理的性能预测HA/KA患者的DOS;(2)比较机器学习在预测性能上是否优于多变量回归;(3)确定在多医院和单医院环境中DOS最重要的预测变量。
使用极端梯度提升(XGBoost)和多变量线性回归进行预测。两种模型都应用于包含多家医院的整个数据集(3704例患者)以及我们样本中病例量最高的医院的单医院数据集(1815例患者)。两个数据集的数据都分为训练数据(75%)和测试数据(25%)。模型在训练数据集上使用5折交叉验证(CV)进行训练,并应用于测试数据以进行性能比较。
在多医院环境的测试数据中,XGBoost的平均绝对误差(MAE)为髋关节置换术12.13分钟/膝关节置换术13.61分钟。在单医院分析中,XGBoost在测试数据上的性能为髋关节置换术MAE 10.87分钟/膝关节置换术MAE 12.53分钟。在所有环境中,XGBoost的预测能力往往优于回归,但无统计学显著性差异。XGBoost的重要预测因素包括医生经验、年龄、体重指数、患者报告的结局指标,对于多医院分析来说,还有医院。
机器学习可以在多医院和单医院环境中以合理的性能预测DOS。回归和机器学习之间的性能略有差异,但不显著,而更大的数据集可能会提高预测性能。研究发现,尽管控制了各种变量,但在多医院环境中医院指标很重要,这突出了医院之间潜在的质量差异。
该研究已在德国临床试验注册中心(DRKS)注册,注册号为DRKS00019916。