Liang Shuhan, Lu Wenbin, Song Rui
Department of Statistics, North Carolina State University, Raleigh, NC 27695, USA.
Stat Theory Relat Fields. 2018;2(1):80-88. doi: 10.1080/24754269.2018.1466096. Epub 2018 May 16.
Recently deep learning has successfully achieved state-of-the-art performance on many difficult tasks. Deep neural network outperforms many existing popular methods in the field of reinforcement learning. It can also identify important covariates automatically. Parameter sharing of convolutional neural network (CNN) greatly reduces the amount of parameters in the neural network, which allows for high scalability. However few research has been done on deep advantage learning (A-learning). In this paper, we present a deep A-learning approach to estimate optimal dynamic treatment regime. A-learning models the advantage function, which is of direct relevance to the goal. We use an inverse probability weighting (IPW) method to estimate the difference between potential outcomes, which does not require to make any model assumption on the baseline mean function. We implemented different architectures of deep CNN and convexified convolutional neural networks (CCNN). The proposed deep A-learning methods are applied to a data from the STAR*D trial and are shown to have better performance compared with the penalized least square estimator using a linear decision rule.
最近,深度学习在许多困难任务上成功取得了领先的性能。深度神经网络在强化学习领域优于许多现有的流行方法。它还能自动识别重要的协变量。卷积神经网络(CNN)的参数共享极大地减少了神经网络中的参数数量,从而实现了高可扩展性。然而,关于深度优势学习(A学习)的研究却很少。在本文中,我们提出了一种深度A学习方法来估计最优动态治疗方案。A学习对优势函数进行建模,该函数与目标直接相关。我们使用逆概率加权(IPW)方法来估计潜在结果之间的差异,这不需要对基线均值函数做出任何模型假设。我们实现了深度CNN和凸化卷积神经网络(CCNN)的不同架构。所提出的深度A学习方法应用于STAR*D试验的数据,并被证明与使用线性决策规则的惩罚最小二乘估计器相比具有更好的性能。