Suppr超能文献

用于微生物组研究的零膨胀潜在狄利克雷分配模型。

A Zero-Inflated Latent Dirichlet Allocation Model for Microbiome Studies.

作者信息

Deek Rebecca A, Li Hongzhe

机构信息

Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States.

出版信息

Front Genet. 2021 Jan 22;11:602594. doi: 10.3389/fgene.2020.602594. eCollection 2020.

Abstract

The human microbiome consists of a community of microbes in varying abundances and is shown to be associated with many diseases. An important first step in many microbiome studies is to identify possible distinct microbial communities in a given data set and to identify the important bacterial taxa that characterize these communities. The data from typical microbiome studies are high dimensional count data with excessive zeros due to both absence of species (structural zeros) and low sequencing depth or dropout. Although methods have been developed for identifying the microbial communities based on mixture models of counts, these methods do not account for excessive zeros observed in the data and do not differentiate structural from sampling zeros. In this paper, we introduce a zero-inflated Latent Dirichlet Allocation model (zinLDA) for sparse count data observed in microbiome studies. zinLDA builds on the flexible Latent Dirichlet Allocation model and allows for zero inflation in observed counts. We develop an efficient Markov chain Monte Carlo (MCMC) sampling procedure to fit the model. Results from our simulations show zinLDA provides better fits to the data and is able to separate structural zeros from sampling zeros. We apply zinLDA to the data set from the American Gut Project and identify microbial communities characterized by different bacterial genera.

摘要

人类微生物组由丰度各异的微生物群落组成,并且已显示出与许多疾病相关。许多微生物组研究的重要第一步是在给定数据集中识别可能不同的微生物群落,并识别表征这些群落的重要细菌分类群。典型微生物组研究的数据是高维计数数据,由于物种缺失(结构零)以及测序深度低或数据丢失,存在大量零值。尽管已经开发出基于计数混合模型来识别微生物群落的方法,但这些方法没有考虑数据中观察到的大量零值,也没有区分结构零和抽样零。在本文中,我们为微生物组研究中观察到的稀疏计数数据引入了零膨胀潜在狄利克雷分配模型(zinLDA)。zinLDA建立在灵活的潜在狄利克雷分配模型之上,并允许观察到的计数中存在零膨胀。我们开发了一种有效的马尔可夫链蒙特卡罗(MCMC)抽样程序来拟合该模型。我们的模拟结果表明,zinLDA能更好地拟合数据,并且能够将结构零与抽样零区分开来。我们将zinLDA应用于美国肠道项目的数据集,并识别出以不同细菌属为特征的微生物群落。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a75/7862749/97b95af5e704/fgene-11-602594-g0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验