Chen Li, Reeve James, Zhang Lujun, Huang Shengbing, Wang Xuefeng, Chen Jun
Department of Health Outcomes Research and Policy, Harrison School of Pharmacy, Auburn University, Auburn, AL, USA.
Bioinformatics and Computational Biology Program, University of Minnesota-Rochester, Rochester, MN, USA.
PeerJ. 2018 Apr 2;6:e4600. doi: 10.7717/peerj.4600. eCollection 2018.
Normalization is the first critical step in microbiome sequencing data analysis used to account for variable library sizes. Current RNA-Seq based normalization methods that have been adapted for microbiome data fail to consider the unique characteristics of microbiome data, which contain a vast number of zeros due to the physical absence or under-sampling of the microbes. Normalization methods that specifically address the zero-inflation remain largely undeveloped. Here we propose geometric mean of pairwise ratios-a simple but effective normalization method-for zero-inflated sequencing data such as microbiome data. Simulation studies and real datasets analyses demonstrate that the proposed method is more robust than competing methods, leading to more powerful detection of differentially abundant taxa and higher reproducibility of the relative abundances of taxa.
标准化是微生物组测序数据分析中的首要关键步骤,用于处理文库大小的差异。当前适用于微生物组数据的基于RNA测序的标准化方法未能考虑微生物组数据的独特特征,即由于微生物实际不存在或采样不足,数据中存在大量零值。专门针对零膨胀问题的标准化方法在很大程度上仍未得到充分发展。在此,我们提出了成对比率的几何平均数——一种简单但有效的标准化方法——用于处理如微生物组数据这类零膨胀测序数据。模拟研究和实际数据集分析表明,所提出的方法比其他竞争方法更稳健,能够更有力地检测差异丰富的分类群,并且分类群相对丰度的再现性更高。