Wadsworth W Duncan, Argiento Raffaele, Guindani Michele, Galloway-Pena Jessica, Shelburne Samuel A, Vannucci Marina
Department of Statistics, Rice University, Houston, TX, USA.
ESOMAS Department, University of Torino and Collegio Carlo Alberto, Torino, Italy.
BMC Bioinformatics. 2017 Feb 8;18(1):94. doi: 10.1186/s12859-017-1516-0.
The Human Microbiome has been variously associated with the immune-regulatory mechanisms involved in the prevention or development of many non-infectious human diseases such as autoimmunity, allergy and cancer. Integrative approaches which aim at associating the composition of the human microbiome with other available information, such as clinical covariates and environmental predictors, are paramount to develop a more complete understanding of the role of microbiome in disease development.
In this manuscript, we propose a Bayesian Dirichlet-Multinomial regression model which uses spike-and-slab priors for the selection of significant associations between a set of available covariates and taxa from a microbiome abundance table. The approach allows straightforward incorporation of the covariates through a log-linear regression parametrization of the parameters of the Dirichlet-Multinomial likelihood. Inference is conducted through a Markov Chain Monte Carlo algorithm, and selection of the significant covariates is based upon the assessment of posterior probabilities of inclusions and the thresholding of the Bayesian false discovery rate. We design a simulation study to evaluate the performance of the proposed method, and then apply our model on a publicly available dataset obtained from the Human Microbiome Project which associates taxa abundances with KEGG orthology pathways. The method is implemented in specifically developed R code, which has been made publicly available.
Our method compares favorably in simulations to several recently proposed approaches for similarly structured data, in terms of increased accuracy and reduced false positive as well as false negative rates. In the application to the data from the Human Microbiome Project, a close evaluation of the biological significance of our findings confirms existing associations in the literature.
人类微生物组与多种免疫调节机制存在不同程度的关联,这些机制参与了许多非感染性人类疾病的预防或发展,如自身免疫性疾病、过敏和癌症。旨在将人类微生物组的组成与其他可用信息(如临床协变量和环境预测因子)相关联的综合方法,对于更全面地理解微生物组在疾病发展中的作用至关重要。
在本论文中,我们提出了一种贝叶斯狄利克雷 - 多项回归模型,该模型使用尖劈 - 平板先验来选择一组可用协变量与微生物组丰度表中的分类群之间的显著关联。该方法通过狄利克雷 - 多项似然参数的对数线性回归参数化,允许直接纳入协变量。通过马尔可夫链蒙特卡罗算法进行推断,显著协变量的选择基于对包含后验概率的评估以及贝叶斯错误发现率的阈值化。我们设计了一个模拟研究来评估所提出方法的性能,然后将我们的模型应用于从人类微生物组计划获得的公开可用数据集,该数据集将分类群丰度与KEGG直系同源途径相关联。该方法是用专门开发的R代码实现的,并且已公开提供。
在模拟中,我们的方法与最近针对类似结构数据提出的几种方法相比,在提高准确性、降低假阳性和假阴性率方面表现出色。在应用于人类微生物组计划的数据时,对我们研究结果的生物学意义进行仔细评估证实了文献中现有的关联。