Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA.
Environmental and Occupational Health, Milken Institute School of Public Health, The George Washington University, Washington, District of Columbia, USA.
Biometrics. 2023 Mar;79(1):264-279. doi: 10.1111/biom.13580. Epub 2021 Nov 10.
This paper is concerned with using multivariate binary observations to estimate the probabilities of unobserved classes with scientific meanings. We focus on the setting where additional information about sample similarities is available and represented by a rooted weighted tree. Every leaf in the given tree contains multiple samples. Shorter distances over the tree between the leaves indicate a priori higher similarity in class probability vectors. We propose a novel data integrative extension to classical latent class models with tree-structured shrinkage. The proposed approach enables (1) borrowing of information across leaves, (2) estimating data-driven leaf groups with distinct vectors of class probabilities, and (3) individual-level probabilistic class assignment given the observed multivariate binary measurements. We derive and implement a scalable posterior inference algorithm in a variational Bayes framework. Extensive simulations show more accurate estimation of class probabilities than alternatives that suboptimally use the additional sample similarity information. A zoonotic infectious disease application is used to illustrate the proposed approach. The paper concludes by a brief discussion on model limitations and extensions.
本文旨在利用多元二分类观测值来估计具有科学意义的未观测类别的概率。我们关注的是存在关于样本相似性的附加信息且其由有根加权树表示的设定。给定树中的每个叶子包含多个样本。叶子之间的树距离越短,则先验类概率向量的相似度越高。我们提出了一种新颖的数据集成方法,将其应用于具有树结构收缩的经典潜在类别模型中。所提出的方法能够(1)在叶子之间进行信息借用,(2)估计具有不同类概率向量的数据驱动叶子组,以及(3)基于观察到的多元二分类测量值进行个体级别的概率分类分配。我们在变分贝叶斯框架中推导出并实现了一个可扩展的后验推断算法。广泛的模拟结果表明,该方法比那些次优地利用附加样本相似性信息的替代方法能够更准确地估计类概率。一个人畜共患传染病应用案例用于说明所提出的方法。最后,文章简要讨论了模型的局限性和扩展。