动态治疗方案的高维Q学习中价值函数的正确推断

Proper Inference for Value Function in High-Dimensional Q-Learning for Dynamic Treatment Regimes.

作者信息

Zhu Wensheng, Zeng Donglin, Song Rui

机构信息

Key Laboratory for Applied Statistics of MOE,School of Mathematics and Statistics, Northeast Normal University, Changchun 130024, China (

Departments of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA (

出版信息

J Am Stat Assoc. 2019;114(527):1404-1417. doi: 10.1080/01621459.2018.1506341. Epub 2018 Oct 29.

DOI:10.1080/01621459.2018.1506341

PMID:31929664

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6953729/

Abstract

Dynamic treatment regimes are a set of decision rules and each treatment decision is tailored over time according to patients' responses to previous treatments as well as covariate history. There is a growing interest in development of correct statistical inference for optimal dynamic treatment regimes to handle the challenges of non-regularity problems in the presence of non-respondents who have zero-treatment effects, especially when the dimension of the tailoring variables is high. In this paper, we propose a high-dimensional Q-learning (HQ-learning) to facilitate the inference of optimal values and parameters. The proposed method allows us to simultaneously estimate the optimal dynamic treatment regimes and select the important variables that truly contribute to the individual reward. At the same time, hard thresholding is introduced in the method to eliminate the effects of the non-respondents. The asymptotic properties for the parameter estimators as well as the estimated optimal value function are then established by adjusting the bias due to thresholding. Both simulation studies and real data analysis demonstrate satisfactory performance for obtaining the proper inference for the value function for the optimal dynamic treatment regimes.

摘要

动态治疗方案是一组决策规则，每个治疗决策会随着时间推移，根据患者对先前治疗的反应以及协变量历史进行调整。针对最优动态治疗方案，如何进行正确的统计推断以应对存在零治疗效果的无反应者情况下的非正则性问题挑战，尤其是在定制变量维度较高时，人们的兴趣日益浓厚。在本文中，我们提出了一种高维Q学习（HQ学习）方法，以促进对最优值和参数的推断。所提出的方法使我们能够同时估计最优动态治疗方案，并选择真正对个体奖励有贡献的重要变量。同时，该方法引入了硬阈值化来消除无反应者的影响。通过调整阈值化引起的偏差，建立了参数估计器以及估计的最优值函数的渐近性质。模拟研究和实际数据分析均表明，在获得最优动态治疗方案价值函数的正确推断方面，该方法具有令人满意的性能。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

动态治疗方案的高维Q学习中价值函数的正确推断

Proper Inference for Value Function in High-Dimensional Q-Learning for Dynamic Treatment Regimes.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

相似文献

引用本文的文献

本文引用的文献

动态治疗方案的高维Q学习中价值函数的正确推断

Proper Inference for Value Function in High-Dimensional Q-Learning for Dynamic Treatment Regimes.

作者信息

机构信息

出版信息