Department of Ophthalmology, Stanford University, Palo Alto, CA, USA.
Department of Ophthalmology, National Taiwan University, Taipei, Taiwan, Republic of China.
Transl Vis Sci Technol. 2024 Sep 3;13(9):5. doi: 10.1167/tvst.13.9.5.
The purpose of this study was to develop deep learning models for surgical video analysis, capable of identifying minimally invasive glaucoma surgery (MIGS) and locating the trabecular meshwork (TM).
For classification of surgical steps, we had 313 video files (265 for cataract surgery and 48 for MIGS procedures), and for TM segmentation, we had 1743 frames (1110 for TM and 633 for no TM). We used transfer learning to update a classification model pretrained to recognize standard cataract surgical steps, enabling it to also identify MIGS procedures. For TM localization, we developed three different models: U-Net, Y-Net, and Cascaded. Segmentation accuracy for TM was measured by calculating the average pixel error between the predicted and ground truth TM locations.
Using transfer learning, we developed a model which achieved 87% accuracy for MIGS frame classification, with area under the receiver operating characteristic curve (AUROC) of 0.99. This model maintained a 79% accuracy for identifying 14 standard cataract surgery steps. The overall micro-averaged AUROC was 0.98. The U-Net model excelled in TM segmentation with an Intersection over union (IoU) score of 0.9988 and an average pixel error of 1.47.
Building on prior work developing computer vision models for cataract surgical video, we developed models that recognize MIGS procedures and precisely localize the TM with superior performance. Our work demonstrates the potential of transfer learning for extending our computer vision models to new surgeries without the need for extensive additional data collection.
Computer vision models in surgical videos can underpin the development of systems offering automated feedback for trainees, improving surgical training and patient care.
本研究旨在开发用于手术视频分析的深度学习模型,能够识别微创青光眼手术(MIGS)并定位小梁网(TM)。
对于手术步骤的分类,我们有 313 个视频文件(265 个用于白内障手术,48 个用于 MIGS 手术),对于 TM 分割,我们有 1743 个帧(1110 个用于 TM,633 个用于无 TM)。我们使用迁移学习来更新一个预先训练的分类模型,使其能够识别标准白内障手术步骤,也能够识别 MIGS 手术。对于 TM 定位,我们开发了三种不同的模型:U-Net、Y-Net 和级联。通过计算预测 TM 位置与真实 TM 位置之间的平均像素误差来衡量 TM 分割的准确性。
使用迁移学习,我们开发了一个模型,该模型对 MIGS 帧分类的准确率达到 87%,接收器工作特征曲线(AUROC)下面积为 0.99。该模型对识别 14 个标准白内障手术步骤的准确率保持在 79%。总体微平均 AUROC 为 0.98。U-Net 模型在 TM 分割方面表现出色,交并比(IoU)得分为 0.9988,平均像素误差为 1.47。
在先前开发用于白内障手术视频的计算机视觉模型的基础上,我们开发了识别 MIGS 手术并精确定位 TM 的模型,具有卓越的性能。我们的工作表明,迁移学习可以将我们的计算机视觉模型扩展到新的手术中,而无需大量额外的数据收集。
田启宇