National Clinical Research Center of Cardiovascular Diseases, Fuwai Hospital, National Center for Cardiovascular Diseases, Beijing, People's Republic of China.
State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Diseases, Beijing, People's Republic of China.
J Am Med Inform Assoc. 2022 Sep 12;29(10):1722-1732. doi: 10.1093/jamia/ocac088.
Warfarin anticoagulation management requires sequential decision-making to adjust dosages based on patients' evolving states continuously. We aimed to leverage reinforcement learning (RL) to optimize the dynamic in-hospital warfarin dosing in patients after surgical valve replacement (SVR).
10 408 SVR cases with warfarin dosage-response data were retrospectively collected to develop and test an RL algorithm that can continuously recommend daily warfarin doses based on patients' evolving multidimensional states. The RL algorithm was compared with clinicians' actual practice and other machine learning and clinical decision rule-based algorithms. The primary outcome was the ratio of patients without in-hospital INRs >3.0 and the INR at discharge within the target range (1.8-2.5) (excellent responders). The secondary outcomes were the safety responder ratio (no INRs >3.0) and the target responder ratio (the discharge INR within 1.8-2.5).
In the test set (n = 1260), the excellent responder ratio under clinicians' guidance was significantly lower than the RL algorithm: 41.6% versus 80.8% (relative risk [RR], 0.51; 95% confidence interval [CI], 0.48-0.55), also the safety responder ratio: 83.1% versus 99.5% (RR, 0.83; 95% CI, 0.81-0.86), and the target responder ratio: 49.7% versus 81.1% (RR, 0.61; 95% CI, 0.58-0.65). The RL algorithms performed significantly better than all the other algorithms. Compared with clinicians' actual practice, the RL-optimized INR trajectory reached and maintained within the target range significantly faster and longer.
RL could offer interactive, practical clinical decision support for sequential decision-making tasks and is potentially adaptable for varied clinical scenarios. Prospective validation is needed.
An RL algorithm significantly optimized the post-operation warfarin anticoagulation quality compared with clinicians' actual practice, suggesting its potential for challenging sequential decision-making tasks.
华法林抗凝管理需要连续决策,根据患者不断变化的状态调整剂量。本研究旨在利用强化学习(RL)优化心脏瓣膜置换术后患者住院期间华法林的动态剂量。
回顾性收集了 10408 例心脏瓣膜置换术后患者的华法林剂量反应数据,用于开发和测试一种 RL 算法,该算法可以根据患者不断变化的多维状态持续推荐每日华法林剂量。将 RL 算法与临床医生的实际实践以及其他基于机器学习和临床决策规则的算法进行比较。主要结局为住院期间 INR >3.0 和出院时 INR 处于目标范围(1.8-2.5)(优秀反应者)的患者比例。次要结局为安全反应者比例(无 INR >3.0)和目标反应者比例(出院 INR 处于 1.8-2.5)。
在测试集(n=1260)中,临床医生指导下的优秀反应者比例明显低于 RL 算法:41.6%比 80.8%(相对风险 [RR],0.51;95%置信区间 [CI],0.48-0.55),安全反应者比例:83.1%比 99.5%(RR,0.83;95% CI,0.81-0.86),目标反应者比例:49.7%比 81.1%(RR,0.61;95% CI,0.58-0.65)。RL 算法的性能明显优于所有其他算法。与临床医生的实际实践相比,RL 优化的 INR 轨迹更快、更长时间地达到并保持在目标范围内。
RL 可以为连续决策任务提供互动式、实用的临床决策支持,并且具有潜在的适应性,可以应用于各种临床场景。需要前瞻性验证。
与临床医生的实际实践相比,RL 算法显著优化了术后华法林抗凝质量,这表明其在具有挑战性的连续决策任务中有应用潜力。