Wu Peng, Zeng Donglin, Wang Yuanjia
Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY 10032; (
Department of Biostatistics, University of North Carolina at Chapel Hill. (
J Am Stat Assoc. 2020;115(529):380-392. doi: 10.1080/01621459.2018.1549050. Epub 2019 Apr 23.
Current guidelines for treatment decision making largely rely on data from randomized controlled trials (RCTs) studying average treatment effects. They may be inadequate to make individualized treatment decisions in real-world settings. Large-scale electronic health records (EHR) provide opportunities to fulfill the goals of personalized medicine and learn individualized treatment rules (ITRs) depending on patient-specific characteristics from real-world patient data. In this work, we tackle challenges with EHRs and propose a machine learning approach based on matching (M-learning) to estimate optimal ITRs from EHRs. This new learning method performs matching instead of inverse probability weighting as commonly used in many existing methods for estimating ITRs to more accurately assess individuals' treatment responses to alternative treatments and alleviate confounding. Matching-based value functions are proposed to compare matched pairs under a unified framework, where various types of outcomes for measuring treatment response (including continuous, ordinal, and discrete outcomes) can easily be accommodated. We establish the Fisher consistency and convergence rate of M-learning. Through extensive simulation studies, we show that M-learning outperforms existing methods when propensity scores are misspecified or when unmeasured confounders are present in certain scenarios. Lastly, we apply M-learning to estimate optimal personalized second-line treatments for type 2 diabetes patients to achieve better glycemic control or reduce major complications using EHRs from New York Presbyterian Hospital.
当前的治疗决策指南很大程度上依赖于研究平均治疗效果的随机对照试验(RCT)数据。在现实环境中,这些指南可能不足以做出个性化的治疗决策。大规模电子健康记录(EHR)为实现精准医疗的目标提供了机会,并能根据真实世界患者数据中患者的特定特征学习个性化治疗规则(ITR)。在这项工作中,我们应对EHR带来的挑战,并提出一种基于匹配的机器学习方法(M学习),以从EHR中估计最优ITR。这种新的学习方法执行匹配操作,而不是像许多现有估计ITR的方法那样使用逆概率加权,以便更准确地评估个体对替代治疗的治疗反应并减轻混杂因素的影响。我们提出基于匹配的价值函数,以便在统一框架下比较匹配对,在该框架中,可以轻松纳入用于衡量治疗反应的各种类型的结果(包括连续、有序和离散结果)。我们确立了M学习的Fisher一致性和收敛速度。通过广泛的模拟研究,我们表明,当倾向得分指定错误或在某些情况下存在未测量的混杂因素时,M学习优于现有方法。最后我们应用M学习,利用纽约长老会医院的EHR估计2型糖尿病患者的最优个性化二线治疗方案,以实现更好的血糖控制或减少主要并发症。