Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, United States of America.
Center for Microbiome Informatics and Therapeutics, Cambridge, MA, United States of America.
PLoS Comput Biol. 2018 Apr 23;14(4):e1006102. doi: 10.1371/journal.pcbi.1006102. eCollection 2018 Apr.
High-throughput data generation platforms, like mass-spectrometry, microarrays, and second-generation sequencing are susceptible to batch effects due to run-to-run variation in reagents, equipment, protocols, or personnel. Currently, batch correction methods are not commonly applied to microbiome sequencing datasets. In this paper, we compare different batch-correction methods applied to microbiome case-control studies. We introduce a model-free normalization procedure where features (i.e. bacterial taxa) in case samples are converted to percentiles of the equivalent features in control samples within a study prior to pooling data across studies. We look at how this percentile-normalization method compares to traditional meta-analysis methods for combining independent p-values and to limma and ComBat, widely used batch-correction models developed for RNA microarray data. Overall, we show that percentile-normalization is a simple, non-parametric approach for correcting batch effects and improving sensitivity in case-control meta-analyses.
高通量数据生成平台,如质谱、微阵列和第二代测序,由于试剂、设备、方案或人员在运行到运行之间的变化,容易受到批次效应的影响。目前,批次校正方法通常不适用于微生物组测序数据集。在本文中,我们比较了应用于微生物组病例对照研究的不同批次校正方法。我们引入了一种无模型的归一化程序,其中病例样本中的特征(即细菌分类群)在研究内将病例样本中的特征转换为对照样本中等效特征的百分比,然后再将数据汇总到研究中。我们研究了这种百分位归一化方法与传统的用于合并独立 p 值的荟萃分析方法以及 limma 和 ComBat 的比较,limma 和 ComBat 是为 RNA 微阵列数据开发的广泛使用的批次校正模型。总体而言,我们表明,百分位归一化是一种简单的、非参数的方法,可用于纠正批次效应并提高病例对照荟萃分析的敏感性。