Department of Ophthalmology, Xinhua Hospital Affiliated to Shanghai Jiaotong University School of Medicine, Shanghai, China.
Department of Ophthalmology, Shanghai Children's Hospital, School of Medicine, Shanghai Jiao Tong University, Lu Ding Road # 355, Shanghai, 200000, China.
BMC Ophthalmol. 2024 Jun 10;24(1):242. doi: 10.1186/s12886-024-03504-8.
Learning to perform strabismus surgery is an essential aspect of ophthalmologists' surgical training. Automated classification strategy for surgical steps can improve the effectiveness of training curricula and the efficient evaluation of residents' performance. To this end, we aimed to develop and validate a deep learning (DL) model for automated detecting strabismus surgery steps in the videos.
In this study, we gathered 479 strabismus surgery videos from Shanghai Children's Hospital, affiliated to Shanghai Jiao Tong University School of Medicine, spanning July 2017 to October 2021. The videos were manually cut into 3345 clips of the eight strabismus surgical steps based on the International Council of Ophthalmology's Ophthalmology Surgical Competency Assessment Rubrics (ICO-OSCAR: strabismus). The videos dataset was randomly split by eye-level into a training (60%), validation (20%) and testing dataset (20%). We evaluated two hybrid DL algorithms: a Recurrent Neural Network (RNN) based and a Transformer-based model. The evaluation metrics included: accuracy, area under the receiver operating characteristic curve, precision, recall and F1-score.
DL models identified the steps in video clips of strabismus surgery achieved macro-average AUC of 1.00 (95% CI 1.00-1.00) with Transformer-based model and 0.98 (95% CI 0.97-1.00) with RNN-based model, respectively. The Transformer-based model yielded a higher accuracy compared with RNN-based models (0.96 vs. 0.83, p < 0.001). In detecting different steps of strabismus surgery, the predictive ability of the Transformer-based model was better than that of the RNN. Precision ranged between 0.90 and 1 for the Transformer-based model and 0.75 to 0.94 for the RNN-based model. The f1-score ranged between 0.93 and 1 for the Transformer-based model and 0.78 to 0.92 for the RNN-based model.
The DL models can automate identify video steps of strabismus surgery with high accuracy and Transformer-based algorithms show excellent performance when modeling spatiotemporal features of video frames.
学习斜视手术是眼科医生手术培训的重要环节。手术步骤的自动化分类策略可以提高培训课程的有效性,并有效评估住院医师的表现。为此,我们旨在开发和验证一种用于自动检测斜视手术视频中手术步骤的深度学习(DL)模型。
本研究从上海交通大学医学院附属上海儿童医学中心收集了 479 例斜视手术视频,时间范围为 2017 年 7 月至 2021 年 10 月。根据国际眼科理事会眼科手术能力评估量表(ICO-OSCAR:斜视),将视频手动剪辑为 8 个斜视手术步骤的 3345 个片段。视频数据集按眼水平随机分为训练集(60%)、验证集(20%)和测试集(20%)。我们评估了两种混合深度学习算法:基于递归神经网络(RNN)和基于 Transformer 的模型。评估指标包括:准确性、受试者工作特征曲线下面积、精度、召回率和 F1 分数。
DL 模型识别斜视手术视频片段中的步骤,基于 Transformer 的模型的宏观平均 AUC 为 1.00(95%CI 1.00-1.00),基于 RNN 的模型为 0.98(95%CI 0.97-1.00)。基于 Transformer 的模型的准确率高于基于 RNN 的模型(0.96 比 0.83,p<0.001)。在检测斜视手术的不同步骤时,基于 Transformer 的模型的预测能力优于 RNN。基于 Transformer 的模型的精度范围为 0.90 到 1,基于 RNN 的模型的精度范围为 0.75 到 0.94。基于 Transformer 的模型的 f1 分数范围为 0.93 到 1,基于 RNN 的模型的 f1 分数范围为 0.78 到 0.92。
DL 模型可以自动识别斜视手术视频中的步骤,具有较高的准确性,基于 Transformer 的算法在对视频帧的时空特征进行建模时表现出优异的性能。