Li Zehang Richard, McComick Tyler H, Clark Samuel J
Department of Biostatistics, Yale School of Public Health, New Haven, CT.
Department of Statistics and Department of Sociology, University of Washington, Seattle, WA.
Bayesian Anal. 2020 Sep;15(3):781-807. doi: 10.1214/19-ba1172. Epub 2019 Sep 24.
Learning dependence relationships among variables of mixed types provides insights in a variety of scientific settings and is a well-studied problem in statistics. Existing methods, however, typically rely on copious, high quality data to accurately learn associations. In this paper, we develop a method for scientific settings where learning dependence structure is essential, but data are sparse and have a high fraction of missing values. Specifically, our work is motivated by survey-based cause of death assessments known as verbal autopsies (VAs). We propose a Bayesian approach to characterize dependence relationships using a latent Gaussian graphical model that incorporates informative priors on the marginal distributions of the variables. We demonstrate such information can improve estimation of the dependence structure, especially in settings with little training data. We show that our method can be integrated into existing probabilistic cause-of-death assignment algorithms and improves model performance while recovering dependence patterns between symptoms that can inform efficient questionnaire design in future data collection.
学习混合类型变量之间的依赖关系能在各种科学环境中提供深刻见解,并且是统计学中一个经过充分研究的问题。然而,现有方法通常依赖大量高质量数据来准确学习关联。在本文中,我们针对学习依赖结构至关重要但数据稀疏且缺失值比例很高的科学环境开发了一种方法。具体而言,我们的工作受到基于调查的死因评估(称为口头尸检,VAs)的推动。我们提出一种贝叶斯方法,使用潜在高斯图形模型来表征依赖关系,该模型在变量的边际分布上纳入了信息先验。我们证明,此类信息可以改进对依赖结构的估计,尤其是在训练数据很少的情况下。我们表明,我们的方法可以集成到现有的概率死因分配算法中,并在恢复症状之间的依赖模式时提高模型性能,这些依赖模式可为未来数据收集的高效问卷设计提供参考。