Awasthi Raghav, Guliani Keerat Kaur, Khan Saif Ahmad, Vashishtha Aniket, Gill Mehrab Singh, Bhatt Arshita, Nagori Aditya, Gupta Aniket, Kumaraguru Ponnurangam, Sethi Tavpritesh
Indraprastha Institute of Information Technology Delhi, India.
Indian Institute of Technology Roorkee, India.
Intell Based Med. 2022;6:100060. doi: 10.1016/j.ibmed.2022.100060. Epub 2022 May 20.
A vaccine is our best bet for mitigating the ongoing onslaught of the pandemic. However, vaccine is also expected to be a limited resource. An optimal allocation strategy, especially in countries with access inequities and temporal separation of hot-spots, might be an effective way of halting the disease spread. We approach this problem by proposing a novel pipeline that dovetails Deep Reinforcement Learning models into a Contextual Bandits approach for optimizing the distribution of vaccine. Whereas the Reinforcement Learning models suggest better actions and rewards, Contextual Bandits allow online modifications that may need to be implemented on a day-to-day basis in the real world scenario. We evaluate this framework against a naive allocation approach of distributing vaccine proportional to the incidence of cases in five different States across India (Assam, Delhi, Jharkhand, Maharashtra and Nagaland) and demonstrate up to 9039 potential infections prevented and a significant increase in the efficacy of limiting the spread over a period of 45 days through the approach. Our models and the platform are extensible to all states of India and potentially across the globe. We also propose novel evaluation strategies including standard compartmental model-based projections and a causality-preserving evaluation of our model. Since all models carry assumptions that may need to be tested in various contexts, we open source our model and contribute a new reinforcement learning environment compatible with OpenAI gym to make it extensible for real-world applications across the globe.
疫苗是缓解当前疫情冲击的最佳手段。然而,疫苗预计也是一种稀缺资源。一种优化的分配策略,尤其是在存在获取不平等和热点地区时间上分隔的国家,可能是阻止疾病传播的有效方法。我们通过提出一种新颖的流程来解决这个问题,该流程将深度强化学习模型与上下文博弈方法相结合,以优化疫苗的分配。强化学习模型能给出更好的行动和奖励,而上下文博弈则允许进行可能需要在现实场景中每日实施的在线调整。我们将这个框架与一种简单的分配方法进行对比评估,该方法是在印度五个不同邦(阿萨姆邦、德里、贾坎德邦、马哈拉施特拉邦和那加兰邦)按照病例发生率来分配疫苗,结果表明通过这种方法在45天内可预防多达9039例潜在感染,并且在限制传播的效果上有显著提升。我们的模型和平台可扩展到印度所有邦,甚至可能推广到全球。我们还提出了新颖的评估策略,包括基于标准 compartmental 模型的预测以及对我们模型的因果关系保留评估。由于所有模型都有可能需要在各种情况下进行检验的假设,我们将模型开源,并贡献了一个与OpenAI gym兼容的新强化学习环境,使其能够扩展以用于全球范围的实际应用。