Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, Illinois, United States of America.
PLoS Comput Biol. 2023 Sep 1;19(9):e1011447. doi: 10.1371/journal.pcbi.1011447. eCollection 2023 Sep.
Microbiome sequencing data normalization is crucial for eliminating technical bias and ensuring accurate downstream analysis. However, this process can be challenging due to the high frequency of zero counts in microbiome data. We propose a novel reference-based normalization method called normalization via rank similarity (RSim) that corrects sample-specific biases, even in the presence of many zero counts. Unlike other normalization methods, RSim does not require additional assumptions or treatments for the high prevalence of zero counts. This makes it robust and minimizes potential bias resulting from procedures that address zero counts, such as pseudo-counts. Our numerical experiments demonstrate that RSim reduces false discoveries, improves detection power, and reveals true biological signals in downstream tasks such as PCoA plotting, association analysis, and differential abundance analysis.
微生物组测序数据的标准化对于消除技术偏差和确保下游分析的准确性至关重要。然而,由于微生物组数据中零计数的高频出现,这一过程可能具有挑战性。我们提出了一种新的基于参考的标准化方法,称为通过秩相似性(RSim)的标准化,该方法即使在存在大量零计数的情况下,也可以纠正特定于样本的偏差。与其他标准化方法不同,RSim 不需要对零计数的高频率进行额外的假设或处理。这使其具有稳健性,并最大限度地减少了处理零计数(例如伪计数)的程序所导致的潜在偏差。我们的数值实验表明,RSim 减少了假阳性发现,提高了检测能力,并在下游任务(如 PCoA 绘图、关联分析和差异丰度分析)中揭示了真正的生物学信号。