Department of Quantitative Health Sciences, Division of Computational Biology, Mayo Clinic, Rochester, MN 55905, USA.
Department of Statistics, Texas A&M University, College Station, TX 77840, USA.
Bioinformatics. 2022 Oct 31;38(21):4969-4971. doi: 10.1093/bioinformatics/btac618.
Due to the sparsity and high dimensionality, microbiome data are routinely summarized into pairwise distances capturing the compositional differences. Many biological insights can be gained by analyzing the distance matrix in relation to some covariates. A microbiome sampling method that characterizes the inter-sample relationship more reproducibly is expected to yield higher statistical power. Traditionally, the intraclass correlation coefficient (ICC) has been used to quantify the degree of reproducibility for a univariate measurement using technical replicates. In this work, we extend the traditional ICC to distance measures and propose a distance-based ICC (dICC). We derive the asymptotic distribution of the sample-based dICC to facilitate statistical inference. We illustrate dICC using a real dataset from a metagenomic reproducibility study.
dICC is implemented in the R CRAN package GUniFrac.
Supplementary data are available at Bioinformatics online.
由于微生物组数据的稀疏性和高维度,通常将其总结为成对距离,以捕捉组成差异。通过分析距离矩阵与某些协变量的关系,可以获得许多生物学见解。预计一种能够更可重现地描述样本间关系的微生物组采样方法将产生更高的统计功效。传统上,使用技术重复使用组内相关系数 (ICC) 来量化单变量测量的可重复性。在这项工作中,我们将传统的 ICC 扩展到距离度量,并提出了基于距离的 ICC(dICC)。我们推导出基于样本的 dICC 的渐近分布,以促进统计推断。我们使用来自宏基因组重现性研究的真实数据集来说明 dICC。
dICC 已在 R CRAN 包 GUniFrac 中实现。
补充数据可在 Bioinformatics 在线获得。