Wang Tao, Zhao Hongyu
Department of Bioinformatics and Biostatistics, Shanghai Jiao Tong University, Shanghai, China.
SJTU-Yale Joint Center for Biostatistics, Shanghai Jiao Tong University, Shanghai, China.
Biometrics. 2017 Sep;73(3):792-801. doi: 10.1111/biom.12654. Epub 2017 Jan 23.
Understanding the factors that alter the composition of the human microbiota may help personalized healthcare strategies and therapeutic drug targets. In many sequencing studies, microbial communities are characterized by a list of taxa, their counts, and their evolutionary relationships represented by a phylogenetic tree. In this article, we consider an extension of the Dirichlet multinomial distribution, called the Dirichlet-tree multinomial distribution, for multivariate, over-dispersed, and tree-structured count data. To address the relationships between these counts and a set of covariates, we propose the Dirichlet-tree multinomial regression model for which we develop a penalized likelihood method for estimating parameters and selecting covariates. For efficient optimization, we adopt the accelerated proximal gradient approach. Simulation studies are presented to demonstrate the good performance of the proposed procedure. An analysis of a data set relating dietary nutrients with bacterial counts is used to show that the incorporation of the tree structure into the model helps increase the prediction power.
了解改变人类微生物群组成的因素可能有助于制定个性化医疗策略和确定治疗药物靶点。在许多测序研究中,微生物群落的特征是一系列分类单元、它们的数量以及由系统发育树表示的进化关系。在本文中,我们考虑狄利克雷多项分布的一种扩展,称为狄利克雷树多项分布,用于处理多变量、过度分散且具有树状结构的计数数据。为了研究这些计数与一组协变量之间的关系,我们提出了狄利克雷树多项回归模型,并为此开发了一种惩罚似然方法来估计参数和选择协变量。为了实现高效优化,我们采用了加速近端梯度法。通过模拟研究来证明所提出方法的良好性能。对一个将膳食营养素与细菌计数相关联的数据集进行分析,结果表明将树状结构纳入模型有助于提高预测能力。