Atlas Biomed Group-Knomx LLC, Moscow, Russia.
Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Institute of Gene Biology, Russian Academy of Sciences, Moscow, Russia.
mSystems. 2022 Jun 28;7(3):e0015522. doi: 10.1128/msystems.00155-22. Epub 2022 May 9.
Linking microbiome composition obtained from metagenomic or 16S rRNA sequencing to various factors poses a real challenge. The compositional approach to such data is well described: a so-called isometric log-ratio (ILR) transform provides correct treatment of relative abundances. Most existing compositional methods differ in the particular choice of the transform. Although this choice does not influence the prediction of a model, it determines the subset of balances between groups of microbial taxa subsequently used for interpreting the composition shifts. We propose a method to interpret these shifts independently of the initial choice of ILR coordinates by the nearest single-balance shift. We describe here application of the method to regression, classification, and principal balance analysis of compositional data. Analytical treatment and cross-validation show that the approach provides the least-squares estimate of a single-balance shift associated with a factor with possible adjustment for covariates. As for classification and principal balance analysis, the nearest balance method provides results comparable to other compositional tools. Its advantages are the absence of assumptions about the number of taxa included in the balance and its low computational cost. The method is implemented in the R package NearestBalance. The method proposed here extends the range of compositional methods providing interpretation of classical statistical tools applied to data converted to the ILR coordinates. It provides a strictly optimal solution in several special cases. The approach is universally applicable to compositional data of any nature, including microbiome data sets.
将从宏基因组或 16S rRNA 测序中获得的微生物组组成与各种因素联系起来是一个真正的挑战。这种数据的组成方法描述得很好:所谓的等比对数(ILR)变换提供了相对丰度的正确处理。大多数现有的组成方法在变换的特定选择上有所不同。虽然这种选择不影响模型的预测,但它决定了随后用于解释组成变化的微生物分类群之间平衡的子集。我们提出了一种方法,可以通过最近的单平衡变化独立于 ILR 坐标的初始选择来解释这些变化。我们在这里描述了该方法在回归、分类和主平衡分析中的应用。分析处理和交叉验证表明,该方法提供了与协变量可能调整的因素相关的单平衡变化的最小二乘估计。对于分类和主平衡分析,最近的平衡方法提供的结果与其他组成工具相当。它的优点是不假设平衡中包含的分类群数量,并且计算成本低。该方法在 R 包 NearestBalance 中实现。 这里提出的方法扩展了提供对转换为 ILR 坐标的数据应用的经典统计工具的解释的组成方法的范围。它在几个特殊情况下提供了严格的最优解。该方法普遍适用于任何性质的组成数据,包括微生物组数据集。