Department of Biostatistics and Health Data Science, Indiana University School of Medicine and Fairbanks School of Public Health, Indianapolis, Indiana, USA.
Department of Mathematical Sciences, Indiana University-Purdue University Indianapolis, Indianapolis, Indiana, USA.
Stat Med. 2021 May 20;40(11):2713-2752. doi: 10.1002/sim.8926. Epub 2021 Mar 19.
Estimation of heterogeneous treatment effects is an essential component of precision medicine. Model and algorithm-based methods have been developed within the causal inference framework to achieve valid estimation and inference. Existing methods such as the A-learner, R-learner, modified covariates method (with and without efficiency augmentation), inverse propensity score weighting, and augmented inverse propensity score weighting have been proposed mostly under the square error loss function. The performance of these methods in the presence of data irregularity and high dimensionality, such as that encountered in electronic health record (EHR) data analysis, has been less studied. In this research, we describe a general formulation that unifies many of the existing learners through a common score function. The new formulation allows the incorporation of least absolute deviation (LAD) regression and dimension reduction techniques to counter the challenges in EHR data analysis. We show that under a set of mild regularity conditions, the resultant estimator has an asymptotic normal distribution. Within this framework, we proposed two specific estimators for EHR analysis based on weighted LAD with penalties for sparsity and smoothness simultaneously. Our simulation studies show that the proposed methods are more robust to outliers under various circumstances. We use these methods to assess the blood pressure-lowering effects of two commonly used antihypertensive therapies.
治疗效果异质性估计是精准医学的重要组成部分。因果推理框架内已经开发出基于模型和算法的方法,以实现有效的估计和推断。现有的方法,如 A-learner、R-learner、改进协变量方法(带和不带效率增强)、逆倾向得分加权和增强逆倾向得分加权,主要是在均方误差损失函数下提出的。这些方法在存在数据不规则性和高维性(如电子健康记录(EHR)数据分析中遇到的情况)时的性能研究较少。在这项研究中,我们通过一个共同的评分函数来描述一个统一许多现有学习者的通用公式。新的公式允许纳入最小绝对偏差(LAD)回归和降维技术,以应对 EHR 数据分析中的挑战。我们表明,在一组温和的正则条件下,所得估计量具有渐近正态分布。在这个框架内,我们提出了两种基于加权 LAD 的具体估计器,同时对稀疏性和平滑性进行惩罚,用于 EHR 分析。我们的模拟研究表明,所提出的方法在各种情况下对离群值更稳健。我们使用这些方法来评估两种常用抗高血压治疗方法的降压效果。