Sun Shuo, Haneuse Sebastien, Levis Alexander W, Lee Catherine, Arterburn David E, Fischer Heidi, Shortreed Susan, Mukherjee Rajarshi
Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, United States.
Department of Statistics & Data Science, Carnegie Mellon University, Pittsburgh, PA 15213, United States.
Biometrics. 2025 Apr 2;81(2). doi: 10.1093/biomtc/ujaf038.
Causal weighted quantile treatment effects (WQTEs) complement standard mean-focused causal contrasts when interest lies at the tails of the counterfactual distribution. However, existing methods for estimating and inferring causal WQTEs assume complete data on all relevant factors, which is often not the case in practice, particularly when the data are not collected for research purposes, such as electronic health records (EHRs) and disease registries. Furthermore, these data may be particularly susceptible to the outcome data being missing-not-at-random (MNAR). This paper proposes to use double sampling, through which the otherwise missing data are ascertained on a sub-sample of study units, as a strategy to mitigate bias due to MNAR data in estimating causal WQTEs. With the additional data, we present identifying conditions that do not require missingness assumptions in the original data. We then propose a novel inverse-probability weighted estimator and derive its asymptotic properties, both pointwise at specific quantiles and uniformly across quantiles over some compact subset of (0,1), allowing the propensity score and double-sampling probabilities to be estimated. For practical inference, we develop a bootstrap method that can be used for both pointwise and uniform inference. A simulation study is conducted to examine the finite sample performance of the proposed estimators. We illustrate the proposed method using EHR data examining the relative effects of 2 bariatric surgery procedures on BMI loss 3 years post-surgery.
当关注反事实分布的尾部时,因果加权分位数处理效应(WQTEs)对标准的以均值为重点的因果对比起到补充作用。然而,现有的估计和推断因果WQTEs的方法假定所有相关因素的数据是完整的,但在实际中情况往往并非如此,特别是当数据并非为研究目的而收集时,例如电子健康记录(EHRs)和疾病登记处的数据。此外,这些数据可能特别容易出现非随机缺失(MNAR)的结果数据。本文提出使用双重抽样,通过这种方式在研究单位的子样本上确定原本缺失的数据,作为一种策略来减轻在估计因果WQTEs时由于MNAR数据导致的偏差。利用这些额外的数据,我们提出了识别条件,这些条件在原始数据中不需要缺失性假设。然后,我们提出了一种新颖的逆概率加权估计器,并推导了其渐近性质,包括在特定分位数处的逐点渐近性质以及在(0,1)的某个紧致子集上跨分位数的一致渐近性质,使得倾向得分和双重抽样概率能够被估计。对于实际推断,我们开发了一种可用于逐点推断和一致推断的自助法。进行了一项模拟研究以检验所提出估计器的有限样本性能。我们使用EHR数据说明了所提出的方法,该数据用于研究两种减肥手术程序对术后3年体重指数降低的相对影响。