用于帕金森病深部脑刺激的样本高效强化学习控制器

Sample-Efficient Reinforcement Learning Controller for Deep Brain Stimulation in Parkinson's Disease.

作者信息

Ravivarapu Harsh, Bagwe Gaurav, Yuan Xiaoyong, Yu Chunxiu, Zhang Lan

机构信息

Department of Electrical and Computer Engineering, Clemson University, Clemson, SC.

Department of Biomedical Engineering, Michigan Technological University, Houghton, Michigan.

出版信息

ArXiv. 2025 Jul 8:arXiv:2507.06326v1.

PMID:40671962

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12265577/

Abstract

Deep brain stimulation (DBS) is an established intervention for Parkinson's disease (PD), but conventional open-loop systems lack adaptability, are energy-inefficient due to continuous stimulation, and provide limited personalization to individual neural dynamics. Adaptive DBS (aDBS) offers a closed-loop alternative, using biomarkers such as beta-band oscillations to dynamically modulate stimulation. While reinforcement learning (RL) holds promise for personalized aDBS control, existing methods suffer from high sample complexity, unstable exploration in binary action spaces, and limited deployability on resource-constrained hardware. We propose SEA-DBS, a sample-efficient actor-critic framework that addresses the core challenges of RL-based adaptive neurostimulation. SEA-DBS integrates a predictive reward model to reduce reliance on real-time feedback and employs Gumbel-Softmax-based exploration for stable, differentiable policy updates in binary action spaces. Together, these components improve sample efficiency, exploration robustness, and compatibility with resource-constrained neuromodulatory hardware. We evaluate SEA-DBS on a biologically realistic simulation of Parkinsonian basal ganglia activity, demonstrating faster convergence, stronger suppression of pathological beta-band power, and resilience to post-training FP16 quantization. Our results show that SEA-DBS offers a practical and effective RL-based aDBS framework for real-time, resource-constrained neuromodulation.

摘要

深部脑刺激（DBS）是一种已确立的帕金森病（PD）干预手段，但传统的开环系统缺乏适应性，由于持续刺激而能源效率低下，且对个体神经动力学的个性化程度有限。自适应深部脑刺激（aDBS）提供了一种闭环替代方案，利用诸如β波段振荡等生物标志物来动态调节刺激。虽然强化学习（RL）有望用于个性化的aDBS控制，但现有方法存在样本复杂度高、在二元动作空间中探索不稳定以及在资源受限硬件上的可部署性有限等问题。我们提出了SEA-DBS，这是一个样本高效的演员-评论家框架，解决了基于强化学习的自适应神经刺激的核心挑战。SEA-DBS集成了一个预测奖励模型，以减少对实时反馈的依赖，并采用基于Gumbel-Softmax的探索方法，在二元动作空间中进行稳定、可微的策略更新。这些组件共同提高了样本效率、探索鲁棒性以及与资源受限神经调节硬件的兼容性。我们在帕金森病基底神经节活动的生物逼真模拟上评估了SEA-DBS，结果表明其收敛速度更快，对病理性β波段功率的抑制更强，并且对训练后FP16量化具有弹性。我们的结果表明，SEA-DBS为实时、资源受限的神经调节提供了一个实用且有效的基于强化学习的aDBS框架。