School of Engineering Technology, Purdue University, West Lafayette, IN 47906, USA.
School of Industrial Engineering, Purdue University, West Lafayette, IN 47906, USA.
Mil Med. 2023 Nov 8;188(Suppl 6):412-419. doi: 10.1093/milmed/usad175.
Remote military operations require rapid response times for effective relief and critical care. Yet, the military theater is under austere conditions, so communication links are unreliable and subject to physical and virtual attacks and degradation at unpredictable times. Immediate medical care at these austere locations requires semi-autonomous teleoperated systems, which enable the completion of medical procedures even under interrupted networks while isolating the medics from the dangers of the battlefield. However, to achieve autonomy for complex surgical and critical care procedures, robots require extensive programming or massive libraries of surgical skill demonstrations to learn effective policies using machine learning algorithms. Although such datasets are achievable for simple tasks, providing a large number of demonstrations for surgical maneuvers is not practical. This article presents a method for learning from demonstration, combining knowledge from demonstrations to eliminate reward shaping in reinforcement learning (RL). In addition to reducing the data required for training, the self-supervised nature of RL, in conjunction with expert knowledge-driven rewards, produces more generalizable policies tolerant to dynamic environment changes. A multimodal representation for interaction enables learning complex contact-rich surgical maneuvers. The effectiveness of the approach is shown using the cricothyroidotomy task, as it is a standard procedure seen in critical care to open the airway. In addition, we also provide a method for segmenting the teleoperator's demonstration into subtasks and classifying the subtasks using sequence modeling.
A database of demonstrations for the cricothyroidotomy task was collected, comprising six fundamental maneuvers referred to as surgemes. The dataset was collected by teleoperating a collaborative robotic platform-SuperBaxter, with modified surgical grippers. Then, two learning models are developed for processing the dataset-one for automatic segmentation of the task demonstrations into a sequence of surgemes and the second for classifying each segment into labeled surgemes. Finally, a multimodal off-policy RL with rewards learned from demonstrations was developed to learn the surgeme execution from these demonstrations.
The task segmentation model has an accuracy of 98.2%. The surgeme classification model using the proposed interaction features achieved a classification accuracy of 96.25% averaged across all surgemes compared to 87.08% without these features and 85.4% using a support vector machine classifier. Finally, the robot execution achieved a task success rate of 93.5% compared to baselines of behavioral cloning (78.3%) and a twin-delayed deep deterministic policy gradient with shaped rewards (82.6%).
Results indicate that the proposed interaction features for the segmentation and classification of surgical tasks improve classification accuracy. The proposed method for learning surgemes from demonstrations exceeds popular methods for skill learning. The effectiveness of the proposed approach demonstrates the potential for future remote telemedicine on battlefields.
远程军事行动需要快速响应时间,以实现有效救援和重症护理。然而,军事环境条件艰苦,因此通信链路不可靠,并且随时可能受到物理和虚拟攻击以及降级。在这些艰苦的地点进行即时医疗需要半自动远程操作系统,即使在网络中断的情况下,这些系统也能使医疗人员完成医疗程序,同时使他们与战场的危险隔离开来。然而,为了实现复杂手术和重症护理程序的自主性,机器人需要进行大量编程或大量手术技能演示库的学习,以使用机器学习算法来学习有效的策略。虽然对于简单任务可以实现这样的数据,但为手术操作提供大量演示是不切实际的。本文提出了一种从演示中学习的方法,该方法结合了演示中的知识,以消除强化学习(RL)中的奖励塑造。除了减少训练所需的数据外,RL 的自我监督性质以及专家知识驱动的奖励,生成了更具通用性且能容忍动态环境变化的策略。用于交互的多模态表示能够学习复杂的接触丰富的手术操作。通过使用关键护理中用于打开气道的标准程序环甲切开术任务来证明该方法的有效性。此外,我们还提供了一种方法,用于将远程操作器的演示分割成子任务,并使用序列建模对子任务进行分类。
收集了环甲切开术任务的演示数据库,其中包括六个称为 surgemes 的基本操作。该数据集是通过远程操作协作机器人平台-SuperBaxter 来收集的,该平台使用了经过修改的手术夹具。然后,为处理数据集开发了两种学习模型:一种用于将任务演示自动分割成 surgemes 序列,另一种用于将每个分段分类为标记的 surgemes。最后,开发了一种带有从演示中学习的奖励的多模态离线策略 RL,以从这些演示中学习 surgeme 的执行。
任务分割模型的准确率为 98.2%。使用所提出的交互特征的 surgeme 分类模型与没有这些特征的模型相比,平均所有 surgemes 的分类准确率为 96.25%,而没有这些特征的模型的分类准确率为 87.08%,使用支持向量机分类器的模型的分类准确率为 85.4%。最后,与行为克隆(78.3%)和带有形状奖励的双延迟深度确定性策略梯度(82.6%)的基准相比,机器人的执行成功率达到了 93.5%。
结果表明,用于手术任务分割和分类的所提出的交互特征提高了分类准确性。与流行的技能学习方法相比,从演示中学习 surgemes 的提出方法效果更好。该方法的有效性证明了其在未来战场上远程远程医疗的潜力。