LZIP：一种用于相关零膨胀计数的贝叶斯潜在因子模型。

The LZIP: A Bayesian latent factor model for correlated zero-inflated counts.

作者信息

Neelon Brian, Chung Dongjun

机构信息

Department of Public Health Sciences, Medical University of South Carolina, Charleston, South Carolina, U.S.A.

出版信息

Biometrics. 2017 Mar;73(1):185-196. doi: 10.1111/biom.12558. Epub 2016 Jul 5.

DOI:10.1111/biom.12558

PMID:27378066

Abstract

Motivated by a study of molecular differences among breast cancer patients, we develop a Bayesian latent factor zero-inflated Poisson (LZIP) model for the analysis of correlated zero-inflated counts. The responses are modeled as independent zero-inflated Poisson distributions conditional on a set of subject-specific latent factors. For each outcome, we express the LZIP model as a function of two discrete random variables: the first captures the propensity to be in an underlying "at-risk" state, while the second represents the count response conditional on being at risk. The latent factors and loadings are assigned conditionally conjugate gamma priors that accommodate overdispersion and dependence among the outcomes. For posterior computation, we propose an efficient data-augmentation algorithm that relies primarily on easily sampled Gibbs steps. We conduct simulation studies to investigate both the inferential properties of the model and the computational capabilities of the proposed sampling algorithm. We apply the method to an analysis of breast cancer genomics data from The Cancer Genome Atlas.

摘要

受一项关于乳腺癌患者分子差异研究的启发，我们开发了一种贝叶斯潜在因子零膨胀泊松（LZIP）模型，用于分析相关的零膨胀计数。响应被建模为在一组特定于个体的潜在因子条件下的独立零膨胀泊松分布。对于每个结果，我们将LZIP模型表示为两个离散随机变量的函数：第一个变量捕获处于潜在“风险”状态的倾向，而第二个变量表示在有风险条件下的计数响应。潜在因子和负荷被赋予条件共轭伽马先验，以适应结果之间的过度分散和相关性。对于后验计算，我们提出了一种有效的数据增强算法，该算法主要依赖于易于抽样的吉布斯步骤。我们进行模拟研究，以调查模型的推断性质和所提出抽样算法的计算能力。我们将该方法应用于对来自癌症基因组图谱的乳腺癌基因组学数据的分析。