Biosciences Department, Faculty of Sciences, Technology and Engineering, University of Vic - Central University of Catalonia, Carrer de La Laura, 13, 08500, Vic, Spain.
Mathematical Department, UPC-Barcelona Tech, Barcelona, Spain.
BMC Bioinformatics. 2023 Mar 6;24(1):82. doi: 10.1186/s12859-023-05205-3.
One of the main challenges of microbiome analysis is its compositional nature that if ignored can lead to spurious results. Addressing the compositional structure of microbiome data is particularly critical in longitudinal studies where abundances measured at different times can correspond to different sub-compositions.
We developed coda4microbiome, a new R package for analyzing microbiome data within the Compositional Data Analysis (CoDA) framework in both, cross-sectional and longitudinal studies. The aim of coda4microbiome is prediction, more specifically, the method is designed to identify a model (microbial signature) containing the minimum number of features with the maximum predictive power. The algorithm relies on the analysis of log-ratios between pairs of components and variable selection is addressed through penalized regression on the "all-pairs log-ratio model", the model containing all possible pairwise log-ratios. For longitudinal data, the algorithm infers dynamic microbial signatures by performing penalized regression over the summary of the log-ratio trajectories (the area under these trajectories). In both, cross-sectional and longitudinal studies, the inferred microbial signature is expressed as the (weighted) balance between two groups of taxa, those that contribute positively to the microbial signature and those that contribute negatively. The package provides several graphical representations that facilitate the interpretation of the analysis and the identified microbial signatures. We illustrate the new method with data from a Crohn's disease study (cross-sectional data) and on the developing microbiome of infants (longitudinal data).
coda4microbiome is a new algorithm for identification of microbial signatures in both, cross-sectional and longitudinal studies. The algorithm is implemented as an R package that is available at CRAN ( https://cran.r-project.org/web/packages/coda4microbiome/ ) and is accompanied with a vignette with a detailed description of the functions. The website of the project contains several tutorials: https://malucalle.github.io/coda4microbiome/.
微生物组分析的主要挑战之一是其组成性质,如果忽略它,可能会导致虚假结果。在纵向研究中,解决微生物组数据的组成结构尤为关键,因为在这些研究中,不同时间测量的丰度可能对应于不同的亚组成。
我们开发了 coda4microbiome,这是一个新的 R 包,用于在横断面和纵向研究中在组合数据分析 (CoDA) 框架内分析微生物组数据。coda4microbiome 的目的是预测,更具体地说,该方法旨在识别一个包含具有最大预测能力的最小特征数的模型(微生物特征)。该算法依赖于对组件对之间的对数比的分析,并且通过对“所有对对数比模型”进行惩罚回归来解决变量选择问题,该模型包含所有可能的成对对数比。对于纵向数据,该算法通过对对数比轨迹(这些轨迹下的面积)的摘要进行惩罚回归来推断动态微生物特征。在横断面和纵向研究中,推断出的微生物特征表示为两组分类群之间的(加权)平衡,这两组分类群对微生物特征有积极贡献,而另一组分类群则有消极贡献。该软件包提供了几个图形表示形式,有助于解释分析和确定的微生物特征。我们使用来自克罗恩病研究(横断面数据)和婴儿发育中的微生物组(纵向数据)的数据来说明新方法。
coda4microbiome 是一种用于识别横断面和纵向研究中微生物特征的新算法。该算法作为一个 R 包实现,可在 CRAN(https://cran.r-project.org/web/packages/coda4microbiome/)上获得,并附有详细描述功能的简介。该项目的网站包含几个教程:https://malucalle.github.io/coda4microbiome/。