Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, 38105, TN, USA.
Health Informatics Institute, University of South Florida, Tampa, 33620, FL, USA.
BMC Genomics. 2022 Sep 19;23(1):661. doi: 10.1186/s12864-022-08890-1.
To identify operational taxonomy units (OTUs) signaling disease onset in an observational study, a powerful strategy was selecting participants by matched sets and profiling temporal metagenomes, followed by trajectory analysis. Existing trajectory analyses modeled individual OTU or microbial community without adjusting for the within-community correlation and matched-set-specific latent factors.
We proposed a joint model with matching and regularization (JMR) to detect OTU-specific trajectory predictive of host disease status. The between- and within-matched-sets heterogeneity in OTU relative abundance and disease risk were modeled by nested random effects. The inherent negative correlation in microbiota composition was adjusted by incorporating and regularizing the top-correlated taxa as longitudinal covariate, pre-selected by Bray-Curtis distance and elastic net regression. We designed a simulation pipeline to generate true biomarkers for disease onset and the pseudo biomarkers caused by compositionality. We demonstrated that JMR effectively controlled the false discovery and pseudo biomarkers in a simulation study generating temporal high-dimensional metagenomic counts with random intercept or slope. Application of the competing methods in the simulated data and the TEDDY cohort showed that JMR outperformed the other methods and identified important taxa in infants' fecal samples with dynamics preceding host disease status.
Our method JMR is a robust framework that models taxon-specific trajectory and host disease status for matched participants without transformation of relative abundance, improving the power of detecting disease-associated microbial features in certain scenarios. JMR is available in R package mtradeR at https://github.com/qianli10000/mtradeR.
为了在观察性研究中确定指示疾病发作的操作分类单元(OTU),一种强大的策略是通过匹配集选择参与者,并对时间宏基因组进行分析,然后进行轨迹分析。现有的轨迹分析模型是针对个体 OTU 或微生物群落进行建模的,而没有调整群落内相关性和匹配集特定的潜在因素。
我们提出了一种带有匹配和正则化的联合模型(JMR),以检测预测宿主疾病状态的 OTU 特定轨迹。OTU 相对丰度和疾病风险的组间和组内匹配集异质性通过嵌套随机效应进行建模。通过纳入和正则化与 OTU 相关度最高的分类群作为纵向协变量,预先选择通过 Bray-Curtis 距离和弹性网络回归,来调整微生物群落组成的固有负相关。我们设计了一个模拟管道,生成疾病发作的真正生物标志物和由组成性引起的伪生物标志物。我们证明了 JMR 在模拟研究中有效地控制了假发现和伪生物标志物,这些模拟研究生成了具有随机截距或斜率的时间高维宏基因组计数。竞争方法在模拟数据和 TEDDY 队列中的应用表明,JMR 优于其他方法,并确定了婴儿粪便样本中具有宿主疾病状态之前动态的重要分类群。
我们的方法 JMR 是一种稳健的框架,用于对匹配的参与者进行分类群特异性轨迹和宿主疾病状态建模,而无需对相对丰度进行转换,从而在某些情况下提高了检测与疾病相关的微生物特征的能力。JMR 可在 https://github.com/qianli10000/mtradeR 上的 R 包 mtradeR 中获得。