Department of Biostatistics, University of Iowa College of Public Health, 145 N Riverside Dr, 52242, IA, USA.
Department of Biostatistics, Yale School of Public Health, 60 College St, 06510, CT, USA.
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab059.
A major task in the analysis of microbiome data is to identify microbes associated with differing biological conditions. Before conducting analysis, raw data must first be adjusted so that counts from different samples are comparable. A typical approach is to estimate normalization factors by which all counts in a sample are multiplied or divided. However, the inherent variation associated with estimation of normalization factors are often not accounted for in subsequent analysis, leading to a loss of precision. Rank normalization is a nonparametric alternative to the estimation of normalization factors in which each count for a microbial feature is replaced by its intrasample rank. Although rank normalization has been successfully applied to microarray analysis in the past, it has yet to be explored for microbiome data, which is characterized by high frequencies of 0s, strongly correlated features and compositionality. We propose to use rank normalization as an alternative to the estimation of normalization factors and examine its performance when paired with a two-sample t-test. On a rigorous 3rd-party benchmarking simulation, it is shown to offer strong control over the false discovery rate, and at sample sizes greater than 50 per treatment group, to offer an improvement in performance over commonly used normalization factors paired with t-tests, Wilcoxon rank-sum tests and methodologies implemented by R packages. On two real datasets, it yielded valid and reproducible results that were strongly in agreement with the original findings and the existing literature, further demonstrating its robustness and future potential. Availability: The data underlying this article are available online along with R code and supplementary materials at https://github.com/matthewlouisdavisBioStat/Rank-Normalization-Empowers-a-T-Test.
微生物组数据分析的主要任务是确定与不同生物条件相关的微生物。在进行分析之前,必须首先调整原始数据,以使来自不同样本的数据具有可比性。一种典型的方法是通过估计标准化因子来实现,即对样本中的所有计数进行乘除。然而,在后续分析中通常没有考虑到与估计标准化因子相关的固有变异,从而导致精度损失。秩归一化是一种替代标准化因子估计的非参数方法,其中每个微生物特征的计数都被替换为其样本内的秩。尽管秩归一化在过去已成功应用于微阵列分析,但尚未针对微生物组数据进行探索,微生物组数据的特征是 0 出现频率高、特征相关性强且具有组成性。我们建议使用秩归一化替代标准化因子的估计,并在与双样本 t 检验结合使用时检查其性能。在严格的第三方基准模拟中,它显示出对假发现率的强有力控制,并且在每个治疗组的样本量大于 50 时,与 t 检验、Wilcoxon 秩和检验和 R 包中实现的方法结合使用的常用标准化因子相比,它在性能上有所提高。在两个真实数据集上,它产生了有效且可重复的结果,与原始发现和现有文献高度一致,进一步证明了其稳健性和未来潜力。可获取性:本文所依据的数据可在线获取,同时还有 R 代码和补充材料,网址为 https://github.com/matthewlouisdavisBioStat/Rank-Normalization-Empowers-a-T-Test。