Center for Orthopaedic Biomechanics, University of Denver, 2155 E Wesley Ave, Denver, CO, 80210, USA.
Unmanned Systems Research Institute, University of Denver, 2155 E Wesley Ave, Denver, CO, 80210, USA.
Int J Comput Assist Radiol Surg. 2023 Dec;18(12):2125-2142. doi: 10.1007/s11548-023-02890-6. Epub 2023 Apr 30.
Multiple applications in open surgical environments may benefit from adoption of markerless computer vision depending on associated speed and accuracy requirements. The current work evaluates vision models for 6-degree of freedom pose estimation of surgical instruments in RGB scenes. Potential use cases are discussed based on observed performance.
Convolutional neural nets were developed with simulated training data for 6-degree of freedom pose estimation of a representative surgical instrument in RGB scenes. Trained models were evaluated with simulated and real-world scenes. Real-world scenes were produced by using a robotic manipulator to procedurally generate a wide range of object poses.
CNNs trained in simulation transferred to real-world evaluation scenes with a mild decrease in pose accuracy. Model performance was sensitive to input image resolution and orientation prediction format. The model with highest accuracy demonstrated mean in-plane translation error of 13 mm and mean long axis orientation error of 5[Formula: see text] in simulated evaluation scenes. Similar errors of 29 mm and 8[Formula: see text] were observed in real-world scenes.
6-DoF pose estimators can predict object pose in RGB scenes with real-time inference speed. Observed pose accuracy suggests that applications such as coarse-grained guidance, surgical skill evaluation, or instrument tracking for tray optimization may benefit from markerless pose estimation.
基于相关的速度和精度要求,无标记计算机视觉在开放式手术环境中的多种应用可能会受益于此。目前的工作评估了用于在 RGB 场景中对手术器械进行六自由度位姿估计的视觉模型。根据观察到的性能讨论了潜在的用例。
针对 RGB 场景中代表性手术器械的六自由度位姿估计,使用模拟训练数据开发了卷积神经网络。使用模拟和真实场景评估了经过训练的模型。通过使用机器人操纵器来程序化地生成广泛的物体姿态,生成了真实场景。
在模拟中训练的 CNN 转移到真实世界的评估场景时,位姿精度略有下降。模型性能对输入图像分辨率和方位预测格式敏感。在模拟评估场景中,具有最高精度的模型表现出平均面内平移误差为 13mm,平均长轴方位误差为 5[公式:见文本]。在真实场景中观察到类似的误差为 29mm 和 8[公式:见文本]。
六自由度位姿估计器可以以实时推断速度预测 RGB 场景中的物体位姿。观察到的位姿精度表明,诸如粗粒度引导、手术技能评估或托盘优化的器械跟踪等应用可能受益于无标记位姿估计。