Innovation Center Computer Assisted Surgery (ICCAS), Leipzig University, Semmelweisstraße 14, 04103, Leipzig, Germany.
Department for Ear-, Nose- and Throat-Surgery, University of Leipzig Medical Center, Leipzig, Germany.
Int J Comput Assist Radiol Surg. 2020 Dec;15(12):2089-2100. doi: 10.1007/s11548-020-02264-2. Epub 2020 Oct 10.
In the context of aviation and automotive navigation technology, assistance functions are associated with predictive planning and wayfinding tasks. In endoscopic minimally invasive surgery, however, assistance so far relies primarily on image-based localization and classification. We show that navigation workflows can be described and used for the prediction of navigation steps.
A natural description vocabulary for observable anatomical landmarks in endoscopic images was defined to create 3850 navigation workflow sentences from 22 annotated functional endoscopic sinus surgery (FESS) recordings. Resulting FESS navigation workflows showed an imbalanced data distribution with over-represented landmarks in the ethmoidal sinus. A transformer model was trained to predict navigation sentences in sequence-to-sequence tasks. The training was performed with the Adam optimizer and label smoothing in a leave-one-out cross-validation study. The sentences were generated using an adapted beam search algorithm with exponential decay beam rescoring. The transformer model was compared to a standard encoder-decoder-model, as well as HMM and LSTM baseline models.
The transformer model reached the highest prediction accuracy for navigation steps at 0.53, followed by 0.35 of the LSTM and 0.32 for the standard encoder-decoder-network. With an accuracy of sentence generation of 0.83, the prediction of navigation steps at sentence-level benefits from the additional semantic information. While standard class representation predictions suffer from an imbalanced data distribution, the attention mechanism also considered underrepresented classes reasonably well.
We implemented a natural language-based prediction method for sentence-level navigation steps in endoscopic surgery. The sentence-level prediction method showed a potential that word relations to navigation tasks can be learned and used for predicting future steps. Further studies are needed to investigate the functionality of path prediction. The prediction approach is a first step in the field of visuo-linguistic navigation assistance for endoscopic minimally invasive surgery.
在航空和汽车导航技术领域,辅助功能与预测规划和导航任务相关。然而,在内窥镜微创手术中,辅助功能主要依赖于基于图像的定位和分类。我们表明,导航工作流程可以被描述并用于预测导航步骤。
定义了内窥镜图像中可观察到的解剖学标志的自然描述词汇,从 22 个标注的功能性内窥镜鼻窦手术 (FESS) 记录中创建了 3850 个导航工作流程句子。结果的 FESS 导航工作流程显示出数据分布不平衡,筛窦中地标过多。使用 Adam 优化器和标签平滑在留一交叉验证研究中对变压器模型进行了训练,以进行序列到序列任务的预测。使用自适应波束搜索算法和指数衰减波束重新评分生成句子。将变压器模型与标准编码器-解码器模型以及 HMM 和 LSTM 基线模型进行了比较。
变压器模型在导航步骤预测方面达到了最高的准确度 0.53,其次是 LSTM 的 0.35 和标准编码器-解码器网络的 0.32。句子生成的准确度为 0.83,在句子级别预测导航步骤得益于额外的语义信息。虽然标准类别表示预测受到数据分布不平衡的影响,但注意力机制也相当合理地考虑了代表性不足的类别。
我们实现了一种基于自然语言的内窥镜手术中句子级导航步骤预测方法。句子级预测方法显示出一种潜力,即可以学习并使用词与导航任务之间的关系来预测未来步骤。需要进一步的研究来调查路径预测的功能。该预测方法是内窥镜微创手术视觉语言导航辅助领域的一个初步步骤。