Gerats Beerend G A, Wolterink Jelmer M, Mol Seb P, Broeders Ivo A M J
AI & Data Science Center Meander Medical Center Amersfoort The Netherlands.
Robotics and Mechatronics University of Twente Enschede The Netherlands.
Healthc Technol Lett. 2024 Dec 12;11(6):411-417. doi: 10.1049/htl2.12113. eCollection 2024 Dec.
Laparoscopic video tracking primarily focuses on two target types: surgical instruments and anatomy. The former could be used for skill assessment, while the latter is necessary for the projection of virtual overlays. Where instrument and anatomy tracking have often been considered two separate problems, in this article, a method is proposed for joint tracking of all structures simultaneously. Based on a single 2D monocular video clip, a neural field is trained to represent a continuous spatiotemporal scene, used to create 3D tracks of all surfaces visible in at least one frame. Due to the small size of instruments, they generally cover a small part of the image only, resulting in decreased tracking accuracy. Therefore, enhanced class weighting is proposed to improve the instrument tracks. The authors evaluate tracking on video clips from laparoscopic cholecystectomies, where they find mean tracking accuracies of 92.4% for anatomical structures and 87.4% for instruments. Additionally, the quality of depth maps obtained from the method's scene reconstructions is assessed. It is shown that these pseudo-depths have comparable quality to a state-of-the-art pre-trained depth estimator. On laparoscopic videos in the SCARED dataset, the method predicts depth with an MAE of 2.9 mm and a relative error of 9.2%. These results show the feasibility of using neural fields for monocular 3D reconstruction of laparoscopic scenes. Code is available via GitHub: https://github.com/Beerend/Surgical-OmniMotion.
手术器械和解剖结构。前者可用于技能评估,而后者是虚拟叠加投影所必需的。在本文中,仪器和解剖结构跟踪通常被视为两个独立的问题,我们提出了一种同时对所有结构进行联合跟踪的方法。基于单个二维单目视频片段,训练一个神经场来表示连续的时空场景,用于创建至少在一帧中可见的所有表面的三维轨迹。由于器械尺寸小,它们通常只覆盖图像的一小部分,导致跟踪精度降低。因此,我们提出了增强类加权来改善器械轨迹。作者对腹腔镜胆囊切除术的视频片段进行了跟踪评估,他们发现解剖结构的平均跟踪准确率为92.4%,器械的平均跟踪准确率为87.4%。此外,还评估了从该方法的场景重建中获得的深度图的质量。结果表明,这些伪深度与经过预训练的最先进深度估计器具有相当的质量。在SCARED数据集中的腹腔镜视频上,该方法预测深度的平均绝对误差为2.9毫米,相对误差为9.2%。这些结果表明了使用神经场进行腹腔镜场景单目三维重建的可行性。代码可通过GitHub获取:https://github.com/Beerend/Surgical-OmniMotion。