Department of Biomedical Engineering, Columbia University, New York, New York, USA.
Artificial Intelligence for Vision Science Laboratory, Edward S. Harkness Eye Institute, Department of Ophthalmology, Columbia University Irving Medical Center, New York, New York, USA.
Transl Vis Sci Technol. 2024 Oct 1;13(10):24. doi: 10.1167/tvst.13.10.24.
To propose a deep learning-based approach for predicting the most-fixated regions on optical coherence tomography (OCT) reports using eye tracking data of ophthalmologists, assisting them in finding medically salient image regions.
We collected eye tracking data of ophthalmology residents, fellows, and faculty as they viewed OCT reports to detect glaucoma. We used a U-Net model as the deep learning backbone and quantized eye tracking coordinates by dividing the input report into an 11 × 11 grid. The model was trained to predict the grids on which fixations would land in unseen OCT reports. We investigated the contribution of different variables, including the viewer's level of expertise, model architecture, and number of eye gaze patterns included in training.
Our approach predicted most-fixated regions in OCT reports with precision of 0.723, recall of 0.562, and f1-score of 0.609. We found that using a grid-based eye tracking structure enabled efficient training and using a U-Net backbone led to the best performance.
Our approach has the potential to assist ophthalmologists in diagnosing glaucoma by predicting the most medically salient regions on OCT reports. Our study suggests the value of eye tracking in guiding deep learning algorithms toward informative regions when experts may not be accessible.
By suggesting important OCT report regions for a glaucoma diagnosis, our model could aid in medical education and serve as a precursor for self-supervised deep learning approaches to expedite early detection of irreversible vision loss owing to glaucoma.
提出一种基于深度学习的方法,使用眼科医生的眼动追踪数据来预测光学相干断层扫描(OCT)报告中的注视热点区域,帮助他们找到有医学意义的图像区域。
我们收集了眼科住院医师、研究员和教员在查看用于诊断青光眼的 OCT 报告时的眼动追踪数据。我们使用 U-Net 模型作为深度学习骨干,并通过将输入报告划分为 11×11 的网格来量化眼动追踪坐标。该模型经过训练,可以预测在未见过的 OCT 报告中注视将落在哪个网格上。我们研究了不同变量的贡献,包括观察者的专业水平、模型架构以及训练中包含的眼动模式数量。
我们的方法预测 OCT 报告中的注视热点区域的准确率为 0.723,召回率为 0.562,F1 得分为 0.609。我们发现,使用基于网格的眼动追踪结构可以实现高效的训练,而使用 U-Net 骨干则可以获得最佳性能。
我们的方法有可能通过预测 OCT 报告中最具医学意义的区域,帮助眼科医生诊断青光眼。我们的研究表明,在无法获得专家指导的情况下,眼动追踪在引导深度学习算法关注有价值的区域方面具有重要价值。