Schamberg Gabriel, Badgeley Marcus, Meschede-Krasa Benyamin, Kwon Ohyoon, Brown Emery N
Picower Institute for Learning and Memory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
Tempus, Chicago, IL 60654, USA.
Artif Intell Med. 2022 Jan;123:102227. doi: 10.1016/j.artmed.2021.102227. Epub 2021 Dec 2.
Anesthesiologists simultaneously manage several aspects of patient care during general anesthesia. Automating administration of hypnotic agents could enable more precise control of a patient's level of unconsciousness and enable anesthesiologists to focus on the most critical aspects of patient care. Reinforcement learning (RL) algorithms can be used to fit a mapping from patient state to a medication regimen. These algorithms can learn complex control policies that, when paired with modern techniques for promoting model interpretability, offer a promising approach for developing a clinically viable system for automated anesthestic drug delivery.
We expand on our prior work applying deep RL to automated anesthetic dosing by now using a continuous-action model based on the actor-critic RL paradigm. The proposed RL agent is composed of a policy network that maps observed anesthetic states to a continuous probability density over propofol-infusion rates and a value network that estimates the favorability of observed states. We train and test three versions of the RL agent using varied reward functions. The agent is trained using simulated pharmacokinetic/pharmacodynamic models with randomized parameters to ensure robustness to patient variability. The model is tested on simulations and retrospectively on nine general anesthesia cases collected in the operating room. We utilize Shapley additive explanations to gain an understanding of the factors with the greatest influence over the agent's decision-making.
The deep RL agent significantly outperformed a proportional-integral-derivative controller (median episode median absolute performance error 1.9% ± 1.8 and 3.1% ± 1.1). The model that was rewarded for minimizing total doses performed the best across simulated patient demographics (median episode median performance error 1.1% ± 0.5). When run on real-world clinical datasets, the agent recommended doses that were consistent with those administered by the anesthesiologist.
The proposed approach marks the first fully continuous deep RL algorithm for automating anesthestic drug dosing. The reward function used by the RL training algorithm can be flexibly designed for desirable practices (e.g. use less anesthetic) and bolstered performances. Through careful analysis of the learned policies, techniques for interpreting dosing decisions, and testing on clinical data, we confirm that the agent's anesthetic dosing is consistent with our understanding of best-practices in anesthesia care.
麻醉医生在全身麻醉期间要同时管理患者护理的多个方面。自动化催眠药物给药能够更精确地控制患者的意识水平,并使麻醉医生能够专注于患者护理中最关键的方面。强化学习(RL)算法可用于拟合从患者状态到药物治疗方案的映射。这些算法可以学习复杂的控制策略,当与促进模型可解释性的现代技术相结合时,为开发临床上可行的自动麻醉药物输送系统提供了一种很有前景的方法。
我们在先前将深度强化学习应用于自动麻醉给药的工作基础上进行拓展,现在使用基于演员 - 评论家强化学习范式的连续动作模型。所提出的强化学习智能体由一个策略网络和一个价值网络组成,策略网络将观察到的麻醉状态映射到丙泊酚输注速率的连续概率密度上,价值网络估计观察到的状态的有利程度。我们使用不同的奖励函数训练和测试强化学习智能体的三个版本。使用具有随机参数的模拟药代动力学/药效学模型对智能体进行训练,以确保对患者变异性具有鲁棒性。该模型在模拟中进行测试,并对在手术室收集的9例全身麻醉病例进行回顾性测试。我们利用夏普利加法解释来了解对智能体决策影响最大的因素。
深度强化学习智能体的表现明显优于比例积分微分控制器(中位数情节中位数绝对性能误差分别为1.9%±1.8和3.1%±1.1)。因使总剂量最小化而获得奖励的模型在模拟患者人群中表现最佳(中位数情节中位数性能误差为1.1%±0.5)。在真实世界临床数据集上运行时,智能体推荐的剂量与麻醉医生给药的剂量一致。
所提出的方法标志着首个用于自动化麻醉药物给药的完全连续深度强化学习算法。强化学习训练算法使用的奖励函数可以灵活设计以实现理想做法(例如使用更少的麻醉剂)并提升性能。通过对学习到的策略进行仔细分析、解释给药决策的技术以及对临床数据的测试,我们确认智能体的麻醉给药与我们对麻醉护理最佳实践的理解一致。