Moosa Farhana, Robertshaw Harry, Karstensen Lennart, Booth Thomas C, Granados Alejandro
School of Biomedical Engineering and Imaging Sciences, Kings College London, London, UK.
AIBE, Friedrich-Alexander University Erlangen-Nürnberg, Erlangen, Germany.
Int J Comput Assist Radiol Surg. 2025 Jun;20(6):1231-1238. doi: 10.1007/s11548-025-03360-x. Epub 2025 Apr 29.
Mechanical thrombectomy (MT) is the gold standard for treating acute ischemic stroke. However, challenges such as operator radiation exposure, reliance on operator experience, and limited treatment access remain. Although autonomous robotics could mitigate some of these limitations, current research lacks benchmarking of reinforcement learning (RL) algorithms for MT. This study aims to evaluate the performance of Deep Deterministic Policy Gradient, Twin Delayed Deep Deterministic Policy Gradient, Soft Actor-Critic, and Proximal Policy Optimization for MT.
Simulated endovascular interventions based on the open-source stEVE platform were employed to train and evaluate RL algorithms. We simulated navigation of a guidewire from the descending aorta to the supra-aortic arteries, a key phase in MT. The impact of tuning hyperparameters, such as learning rate and network size, was explored. Optimized hyperparameters were used for assessment on an MT benchmark.
Before tuning, Deep Deterministic Policy Gradient had the highest success rate at 80% with a procedure time of 6.87 s when navigating to the supra-aortic arteries. After tuning, Proximal Policy Optimization achieved the highest success rate at 84% with a procedure time of 5.08 s. On the MT benchmark, Twin Delayed Deep Deterministic Policy Gradient recorded the highest success rate at 68% with a procedure time of 214.05 s.
This work advances autonomous endovascular navigation by establishing a benchmark for MT. The results emphasize the importance of hyperparameter tuning on the performance of RL algorithms. Future research should extend this benchmark to identify the most effective RL algorithm.
机械取栓术(MT)是治疗急性缺血性卒中的金标准。然而,诸如术者辐射暴露、依赖术者经验以及治疗途径有限等挑战依然存在。尽管自主机器人技术可以缓解其中一些局限性,但目前的研究缺乏针对MT的强化学习(RL)算法的基准测试。本研究旨在评估深度确定性策略梯度、双延迟深度确定性策略梯度、软演员-评论家以及近端策略优化算法在MT中的性能。
基于开源的stEVE平台进行模拟血管内介入操作,以训练和评估RL算法。我们模拟了导丝从降主动脉到主动脉弓上动脉的导航过程,这是MT中的一个关键阶段。探讨了调整超参数(如学习率和网络规模)的影响。使用优化后的超参数在MT基准测试中进行评估。
在调整之前,深度确定性策略梯度在导航至主动脉弓上动脉时成功率最高,为80%,操作时间为6.87秒。调整后,近端策略优化的成功率最高,为84%,操作时间为5.08秒。在MT基准测试中,双延迟深度确定性策略梯度的成功率最高,为68%,操作时间为214.05秒。
本研究通过建立MT的基准测试推动了自主血管内导航技术的发展。结果强调了超参数调整对RL算法性能的重要性。未来的研究应扩展此基准测试以确定最有效的RL算法。