Xiao Jian, Chen Li, Johnson Stephen, Yu Yue, Zhang Xianyang, Chen Jun
Division of Biomedical Statistics and Informatics and Center for Individualized Medicine, Mayo Clinic, Rochester, MN, United States.
School of Statistics and Mathematics, Zhongnan University of Economics and Law, Hubei, China.
Front Microbiol. 2018 Jun 27;9:1391. doi: 10.3389/fmicb.2018.01391. eCollection 2018.
Recent human microbiome studies have revealed an essential role of the human microbiome in health and disease, opening up the possibility of building microbiome-based predictive models for individualized medicine. One unique characteristic of microbiome data is the existence of a phylogenetic tree that relates all the microbial species. It has frequently been observed that a cluster or clusters of bacteria at varying phylogenetic depths are associated with some clinical or biological outcome due to shared biological function (). Moreover, in many cases, we observe a community-level change, where a large number of functionally interdependent species are associated with the outcome (). We thus develop "glmmTree," a prediction method based on a generalized linear mixed model framework, for capturing clustered and dense microbiome signals. glmmTree uses the similarity between microbiomes, which is defined based on the microbiome composition and the phylogenetic tree, to predict the outcome. The effects of other predictive variables (e.g., age, sex) can be incorporated readily in the regression framework. Additional tuning parameters enable a data-adaptive approach to capture signals at different phylogenetic depth and abundance level. Simulation studies and real data applications demonstrated that "glmmTree" outperformed existing methods in the dense and clustered signal scenarios.
近期的人类微生物组研究揭示了人类微生物组在健康和疾病中的重要作用,为建立基于微生物组的个性化医学预测模型开辟了可能性。微生物组数据的一个独特特征是存在一个关联所有微生物物种的系统发育树。人们经常观察到,由于共享生物学功能,处于不同系统发育深度的一个或多个细菌簇与某些临床或生物学结果相关。此外,在许多情况下,我们观察到群落水平的变化,即大量功能相互依赖的物种与该结果相关。因此,我们开发了“glmmTree”,一种基于广义线性混合模型框架的预测方法,用于捕捉聚类和密集的微生物组信号。glmmTree利用基于微生物组组成和系统发育树定义的微生物组之间的相似性来预测结果。其他预测变量(如年龄、性别)的影响可以很容易地纳入回归框架。额外的调整参数实现了一种数据自适应方法,以捕捉不同系统发育深度和丰度水平的信号。模拟研究和实际数据应用表明,“glmmTree”在密集和聚类信号场景中优于现有方法。