Hu Lin, Yu Jie, Yang Chunxia, Chen Miaoshuang, Tang Zihuan, Liao Rujun, Jike Chunnong, Wang Ju, Wang Ruobing, Liao Qiang, Zhang Tao
Department of Epidemiology and Health Statistics, West China School of Public Health and West China Fourth Hospital, Sichuan University, Chengdu, Sichuan, 610041, China.
Sichuan Center for Disease Control and Prevention, Chengdu, Sichuan, 610041, China.
BMC Med Res Methodol. 2025 Aug 1;25(1):185. doi: 10.1186/s12874-025-02627-w.
The aim of this study was to explore the impact of model misspecification, balance, and extreme weights on average treatment effect (ATE) estimation in hierarchical data with unmeasured cluster-level confounders using the multilevel propensity score model and inverse probability weight (IPW).
We simulated 48 hierarchical data scenarios with unmeasured cluster-level confounders, fitting nine ATE estimation strategies. These strategies were combined with IPW, which used both marginal stabilized weights and cluster-mean stabilized weights. Extreme weights were handled by truncation. Moreover, these models were applied to data from patients co-infected with Human Immunodeficiency Virus (HIV) and Tuberculosis (TB) in Liangshan Prefecture, Sichuan, China, to estimate the ATE of TB treatment delay on treatment outcomes.
The simulation study revealed that FEM-Marginal tended to generate the most extreme weights, whereas BART-FE-Marginal considerably reduced the extreme weights in a large number of small clusters. When the data satisfied the positivity assumption, the marginal stabilized weight strategy had the largest absolute percentage bias and RMSE, whereas the cluster-mean stabilized weight strategy had the smallest. Case studies applying different ATE strategies have shown that among HIV-TB co-infected patients, TB treatment delay was a risk factor for treatment outcome.
To better control unmeasured cluster-level confounders, it was more important to consider cluster characteristics when estimating ATE. The use of Bayesian additive regression trees (BART) for constructing multilevel propensity score models, or of cluster-mean stabilized weights is recommended. However, if marginal stabilized weights are used, extreme weight handling methods are necessary to improve effect estimation. In hierarchical data with unmeasured cluster-level confounders, reducing extreme weights, weight variability, and model misspecification while enhancing balance effectively minimizes estimation bias. The case study revealed that TB treatment delay remained associated with treatment outcomes even after accounting for unmeasured cluster-level confounders.
本研究旨在探讨模型误设、平衡性和极端权重对使用多层倾向得分模型和逆概率权重(IPW)估计具有未测量聚类水平混杂因素的分层数据中平均治疗效果(ATE)的影响。
我们模拟了48个具有未测量聚类水平混杂因素的分层数据场景,拟合了9种ATE估计策略。这些策略与IPW相结合,IPW使用了边际稳定权重和聚类均值稳定权重。极端权重通过截断处理。此外,将这些模型应用于中国四川省凉山州人类免疫缺陷病毒(HIV)和结核病(TB)合并感染患者的数据,以估计TB治疗延迟对治疗结局的ATE。
模拟研究表明,有限元法-边际法往往会产生最极端的权重,而贝叶斯加法回归树-固定效应-边际法在大量小聚类中显著降低了极端权重。当数据满足正性假设时,边际稳定权重策略的绝对百分比偏差和均方根误差最大,而聚类均值稳定权重策略最小。应用不同ATE策略的案例研究表明,在HIV-TB合并感染患者中,TB治疗延迟是治疗结局的一个危险因素。
为了更好地控制未测量的聚类水平混杂因素,在估计ATE时考虑聚类特征更为重要。建议使用贝叶斯加法回归树(BART)构建多层倾向得分模型,或使用聚类均值稳定权重。然而,如果使用边际稳定权重,则需要极端权重处理方法来改善效应估计。在具有未测量聚类水平混杂因素的分层数据中,减少极端权重、权重变异性和模型误设,同时增强平衡性,可有效最小化估计偏差。案例研究表明,即使在考虑了未测量的聚类水平混杂因素后,TB治疗延迟仍与治疗结局相关。