Yeh Hsu-Hang, Sen Simmi, Chou Jonathan C, Christopher Karen L, Wang Sophia Y
Department of Biomedical Data Science, Stanford University, Palo Alto, CA, USA.
Department of Ophthalmology, Byers Eye Institute, Stanford University, Palo Alto, CA, USA.
Transl Vis Sci Technol. 2025 May 1;14(5):2. doi: 10.1167/tvst.14.5.2.
To investigate whether cataract surgical skill performance metrics automatically generated by artificial intelligence (AI) models can differentiate between trainee and faculty surgeons and the correlation between AI metrics and expert-rated skills.
Routine cataract surgical videos from residents (N = 28) and attendings (N = 29) were collected. Three video-level metrics were generated by deep learning models: phacoemulsification probe decentration, eye decentration, and zoom level change. Three types of instrument- and landmark- specific metrics were generated for the limbus, pupil, and various surgical instruments: total path length, maximum velocity, and area. Expert human judges assessed the surgical videos using the Objective Structured Assessment of Cataract Surgical Skill (OSACSS). Statistical differences between AI and human-rated scores between attending surgeons and trainees were assessed using t-tests, and the correlations between them were examined by Pearson correlation coefficients.
The phacoemulsification probe had significantly lower total path lengths, maximum velocities, and area metrics in attending videos. Attending surgeons demonstrated better phacoemulsification centration and eye centration. Most AI metrics negatively correlated with OSACSS scores, including phacoemulsification decentration (r = -0.369) and eye decentration (r = -0.394). OSACSS subitems related to eye centration and different steps of surgery also exhibited significant negative correlations with corresponding AI metrics (r ranging from -0.77 to -0.49).
Automatically generated AI metrics can be used to differentiate between attending and trainee surgeries and correlate with the human expert evaluation on surgical performance.
AI-generated useful metrics that correlate with surgeon skill may be useful for improving cataract surgical education.
研究人工智能(AI)模型自动生成的白内障手术技能表现指标能否区分实习医生和带教医生,以及AI指标与专家评定技能之间的相关性。
收集了住院医师(N = 28)和主治医生(N = 29)的常规白内障手术视频。深度学习模型生成了三个视频层面的指标:超声乳化探头偏心度、眼球偏心度和变焦水平变化。针对角膜缘、瞳孔和各种手术器械生成了三种特定于器械和标志物的指标:总路径长度、最大速度和面积。专业的人类评判员使用白内障手术技能客观结构化评估(OSACSS)对手术视频进行评估。使用t检验评估主治医生和实习医生之间AI评分与人工评分的统计学差异,并通过Pearson相关系数检验它们之间的相关性。
在主治医生的视频中,超声乳化探头的总路径长度、最大速度和面积指标显著更低。主治医生在超声乳化对中及眼球对中方面表现更好。大多数AI指标与OSACSS评分呈负相关,包括超声乳化偏心度(r = -0.369)和眼球偏心度(r = -0.394)。与眼球对中和手术不同步骤相关的OSACSS子项目也与相应的AI指标呈现显著负相关(r范围为-0.77至-0.49)。
自动生成的AI指标可用于区分主治医生和实习医生的手术,并与人类专家对手术表现的评估相关。
与外科医生技能相关的AI生成的有用指标可能有助于改善白内障手术教育。