Cui Ying, Peng Limin, Hu Yijuan, Lai HuiChuan J
Department of Biostatistics and Bioinformatics, Emory University, Atlanta, U.S.A.
Department of Bioinformatics and Biostatistics, Emory University, Atlanta, U.S.A.
J R Stat Soc Ser C Appl Stat. 2021 Aug;70(4):1027-1048. doi: 10.1111/rssc.12497. Epub 2021 Jun 11.
Evaluating the reproducibility or agreement of microbiome measurements is often a crucial step to ensure rigorous downstream analyses in microbiome studies. In this paper, we address this need by developing adaptations of Lin's concordance correlation coefficient (CCC) tailored to microbiome studies. We introduce a general formulation of the new CCC measures upon the use of a distance function appropriately characterizing the discrepancy between microbiome compositional measurements. We thoroughly study the special cases that adopt Euclidean distance and Aitchison distance. Our proposals appropriately account for the unique features of microbiome compositional data, including high-dimensionality, dependency among individual relative abundances, and the presence of many zeros. We further investigate a practical compound approach to help better understand the sources of data inconsistency. Extensive simulation studies are conducted to evaluate the utility of the proposed methods in realistic scenarios. We also apply the proposed methods to a microbiome validation dataset from the .. (FIRST) study. Our analyses offer useful insight about the extent of data variations resulted from two different experiment procedures as well as their heterogeneous patterns across genera.
评估微生物组测量的可重复性或一致性通常是确保微生物组研究中严格的下游分析的关键步骤。在本文中,我们通过开发适用于微生物组研究的林氏一致性相关系数(CCC)的改编版来满足这一需求。我们在使用适当表征微生物组组成测量之间差异的距离函数的基础上,引入了新的CCC测量的一般公式。我们深入研究了采用欧几里得距离和艾奇逊距离的特殊情况。我们的提议适当地考虑了微生物组组成数据的独特特征,包括高维度、个体相对丰度之间的依赖性以及大量零值的存在。我们进一步研究了一种实用的复合方法,以帮助更好地理解数据不一致的来源。进行了广泛的模拟研究,以评估所提出方法在实际场景中的效用。我们还将所提出的方法应用于来自..(FIRST)研究的微生物组验证数据集。我们的分析提供了关于两种不同实验程序导致的数据变化程度及其跨属的异质模式的有用见解。