Institute for Informatics, Washington University School of Medicine, St. Louis, MO.
Division of Biostatistics, Washington University School of Medicine, St. Louis, MO.
Spine (Phila Pa 1976). 2023 Aug 15;48(16):1138-1147. doi: 10.1097/BRS.0000000000004734. Epub 2023 May 29.
Retrospective cohort.
The aim of this study was to design a risk-stratified benchmarking tool for adolescent idiopathic scoliosis (AIS) surgeries.
Machine learning (ML) is an emerging method for prediction modeling in orthopedic surgery. Benchmarking is an established method of process improvement and is an area of opportunity for ML methods. Current surgical benchmark tools often use ranks and no "gold standards" for comparisons exist.
Data from 6076 AIS surgeries were collected from a multicenter registry and divided into three datasets: encompassing surgeries performed (1) during the entire registry, (2) the past 10 years, and (3) during the last 5 years of the registry. We trained three ML regression models (baseline linear regression, gradient boosting, and eXtreme gradient boosted) on each data subset to predict each of the five outcome variables, length of stay (LOS), estimated blood loss (EBL), operative time, Scoliosis Research Society (SRS)-Pain and SRS-Self-Image. Performance was categorized as "below expected" if performing worse than one standard deviation of the mean, "as expected" if within 1 SD, and "better than expected" if better than 1 SD of the mean.
Ensemble ML methods classified performance better than traditional regression techniques for LOS, EBL, and operative time. The best performing models for predicting LOS and EBL were trained on data collected in the last 5 years, while operative time used the entire 10-year dataset. No models were able to predict SRS-Pain or SRS-Self-Image in any useful manner. Point-precise estimates for continuous variables were subject to high average errors.
Classification of benchmark outcomes is improved with ensemble ML techniques and may provide much needed case-adjustment for a surgeon performance program. Precise estimates of health-related quality of life scores and continuous variables were not possible, suggesting that performance classification is a better method of performance evaluation.
回顾性队列研究。
本研究旨在为青少年特发性脊柱侧凸(AIS)手术设计一种风险分层基准工具。
机器学习(ML)是骨科手术中预测建模的一种新兴方法。基准测试是一种已建立的流程改进方法,也是 ML 方法的一个机会领域。当前的手术基准工具通常使用排名,并且不存在比较的“黄金标准”。
从一个多中心登记处收集了 6076 例 AIS 手术的数据,并将其分为三个数据集:涵盖了在整个登记处进行的手术(1)、过去 10 年进行的手术(2)和登记处最后 5 年进行的手术(3)。我们在每个数据子集中训练了三种 ML 回归模型(基线线性回归、梯度提升和极端梯度提升),以预测五个结果变量中的每一个,包括住院时间(LOS)、估计失血量(EBL)、手术时间、脊柱侧凸研究协会(SRS)-疼痛和 SRS-自我形象。如果表现低于平均值的一个标准差,则归类为“低于预期”;如果在 1 个标准差内,则归类为“符合预期”;如果优于平均值的 1 个标准差,则归类为“优于预期”。
与传统回归技术相比,基于集合的 ML 方法对 LOS、EBL 和手术时间的性能分类更好。预测 LOS 和 EBL 的最佳表现模型是在过去 5 年收集的数据上训练的,而手术时间则使用了整个 10 年数据集。没有模型能够以任何有用的方式预测 SRS 疼痛或 SRS 自我形象。连续变量的精确点估计受到高平均误差的影响。
基于集合的 ML 技术可以提高基准结果的分类,并且可能为外科医生绩效计划提供急需的病例调整。对于健康相关生活质量评分和连续变量的精确估计是不可能的,这表明绩效分类是一种更好的绩效评估方法。