Heart Institute, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA.
Division of Statistics and Data Science, University of Cincinnati, Cincinnati, Ohio, USA.
Biometrics. 2023 Dec;79(4):3624-3636. doi: 10.1111/biom.13918. Epub 2023 Aug 8.
Missing data are a pervasive issue in observational studies using electronic health records or patient registries. It presents unique challenges for statistical inference, especially causal inference. Inappropriately handling missing data in causal inference could potentially bias causal estimation. Besides missing data problems, observational health data structures typically have mixed-type variables - continuous and categorical covariates - whose joint distribution is often too complex to be modeled by simple parametric models. The existence of missing values in covariates and outcomes makes the causal inference even more challenging, while most standard causal inference approaches assume fully observed data or start their works after imputing missing values in a separate preprocessing stage. To address these problems, we introduce a Bayesian nonparametric causal model to estimate causal effects with missing data. The proposed approach can simultaneously impute missing values, account for multiple outcomes, and estimate causal effects under the potential outcomes framework. We provide three simulation studies to show the performance of our proposed method under complicated data settings whose features are similar to our case studies. For example, Simulation Study 3 assumes the case where missing values exist in both outcomes and covariates. Two case studies were conducted applying our method to evaluate the comparative effectiveness of treatments for chronic disease management in juvenile idiopathic arthritis and cystic fibrosis.
在使用电子健康记录或患者登记处的观察性研究中,缺失数据是一个普遍存在的问题。它对统计推断,特别是因果推断提出了独特的挑战。在因果推断中不恰当地处理缺失数据可能会潜在地偏置因果估计。除了缺失数据问题外,观察性健康数据结构通常具有混合类型变量 - 连续和分类协变量 - 其联合分布通常太复杂,无法通过简单的参数模型进行建模。协变量和结果中的缺失值的存在使得因果推断更加具有挑战性,而大多数标准的因果推断方法假设完全观察到的数据,或者在单独的预处理阶段对缺失值进行插补后开始工作。为了解决这些问题,我们引入了一种贝叶斯非参数因果模型来估计具有缺失数据的因果效应。所提出的方法可以同时插补缺失值,考虑多个结果,并在潜在结果框架下估计因果效应。我们提供了三项模拟研究,以在与我们的案例研究相似的复杂数据设置下展示我们提出的方法的性能。例如,模拟研究 3 假设在结果和协变量中都存在缺失值的情况。进行了两项案例研究,应用我们的方法来评估青少年特发性关节炎和囊性纤维化中慢性疾病管理治疗的比较效果。