Melbourne Integrative Genomics, School of Mathematics and Statistics, University of Melbourne, Melbourne, VIC, 3052, Australia.
Brief Bioinform. 2020 Dec 1;21(6):1954-1970. doi: 10.1093/bib/bbz105.
Microbial communities have been increasingly studied in recent years to investigate their role in ecological habitats. However, microbiome studies are difficult to reproduce or replicate as they may suffer from confounding factors that are unavoidable in practice and originate from biological, technical or computational sources. In this review, we define batch effects as unwanted variation introduced by confounding factors that are not related to any factors of interest. Computational and analytical methods are required to remove or account for batch effects. However, inherent microbiome data characteristics (e.g. sparse, compositional and multivariate) challenge the development and application of batch effect adjustment methods to either account or correct for batch effects. We present commonly encountered sources of batch effects that we illustrate in several case studies. We discuss the limitations of current methods, which often have assumptions that are not met due to the peculiarities of microbiome data. We provide practical guidelines for assessing the efficiency of the methods based on visual and numerical outputs and a thorough tutorial to reproduce the analyses conducted in this review.
近年来,人们越来越多地研究微生物群落,以探究它们在生态生境中的作用。然而,由于微生物组研究可能会受到混杂因素的影响,而这些混杂因素在实践中是不可避免的,并且来源于生物、技术或计算方面的来源,因此它们很难进行复制或再现。在这篇综述中,我们将批次效应定义为由与任何感兴趣因素无关的混杂因素引入的非期望变化。需要计算和分析方法来去除或解释批次效应。然而,固有的微生物组数据特征(例如稀疏、组成和多元)挑战了批次效应调整方法的开发和应用,这些方法要么用于解释批次效应,要么用于纠正批次效应。我们提出了常见的批次效应来源,并用几个案例研究来说明这些来源。我们讨论了当前方法的局限性,由于微生物组数据的特殊性,这些方法往往存在未满足的假设。我们提供了基于可视化和数值输出评估方法效率的实用指南,并提供了一个详尽的教程,以重现本综述中进行的分析。