12228Yale University School of Medicine, New Haven, CT, USA.
Vector Institute and Department of Computer Science, University of Toronto, Toronto, ON, Canada.
Surg Innov. 2023 Feb;30(1):94-102. doi: 10.1177/15533506221095298. Epub 2022 May 3.
The revolutions in AI hold tremendous capacity to augment human achievements in surgery, but robust integration of deep learning algorithms with high-fidelity surgical simulation remains a challenge. We present a novel application of reinforcement learning (RL) for automating surgical maneuvers in a graphical simulation. In the Unity3D game engine, the Machine Learning-Agents package was integrated with the NVIDIA FleX particle simulator for developing autonomously behaving RL-trained scissors. Proximal Policy Optimization (PPO) was used to reward movements and desired behavior such as movement along desired trajectory and optimized cutting maneuvers along the deformable tissue-like object. Constant and proportional reward functions were tested, and TensorFlow analytics was used to informed hyperparameter tuning and evaluate performance. RL-trained scissors reliably manipulated the rendered tissue that was simulated with soft-tissue properties. A desirable trajectory of the autonomously behaving scissors was achieved along 1 axis. Proportional rewards performed better compared to constant rewards. Cumulative reward and PPO metrics did not consistently improve across RL-trained scissors in the setting for movement across 2 axes (horizontal and depth). Game engines hold promising potential for the design and implementation of RL-based solutions to simulated surgical subtasks. Task completion was sufficiently achieved in one-dimensional movement in simulations with and without tissue-rendering. Further work is needed to optimize network architecture and parameter tuning for increasing complexity.
人工智能的革命具有极大的潜力,可以增强人类在手术方面的成就,但将深度学习算法与高保真手术模拟进行稳健整合仍然是一个挑战。我们提出了一种在图形模拟中自动执行手术操作的强化学习 (RL) 的新应用。在 Unity3D 游戏引擎中,集成了 Machine Learning-Agents 包与 NVIDIA FleX 粒子模拟器,以开发自主行为的 RL 训练剪刀。使用近端策略优化 (PPO) 来奖励运动和期望行为,例如沿期望轨迹运动和沿可变形组织样物体优化切割操作。测试了常数和比例奖励函数,并使用 TensorFlow 分析来通知超参数调整和评估性能。RL 训练的剪刀可靠地操纵了具有软组织属性的渲染组织。自主行为的剪刀沿着 1 个轴实现了期望的轨迹。与常数奖励相比,比例奖励表现更好。在两个轴(水平和深度)上移动的设置中,累积奖励和 PPO 指标并没有随着 RL 训练剪刀的一致性提高而提高。游戏引擎为设计和实现基于 RL 的模拟手术子任务解决方案提供了很大的潜力。在有组织渲染和无组织渲染的模拟中,在一维运动中都可以充分完成任务。需要进一步的工作来优化网络架构和参数调整以增加复杂性。