Gene Expression and RNA Metabolism Laboratory, Instituto de Biomedicina de Valencia, Consejo Superior de Investigaciones Científicas, Valencia 46010, Spain.
Multivariate Statistical Engineering Group, Department of Applied Statistics, Operations Research and Quality, Universitat Politècnica de València, Valencia 46022, Spain.
Bioinformatics. 2022 Apr 28;38(9):2657-2658. doi: 10.1093/bioinformatics/btac132.
Batch effects in omics datasets are usually a source of technical noise that masks the biological signal and hampers data analysis. Batch effect removal has been widely addressed for individual omics technologies. However, multi-omic datasets may combine data obtained in different batches where omics type and batch are often confounded. Moreover, systematic biases may be introduced without notice during data acquisition, which creates a hidden batch effect. Current methods fail to address batch effect correction in these cases.
In this article, we introduce the MultiBaC R package, a tool for batch effect removal in multi-omics and hidden batch effect scenarios. The package includes a diversity of graphical outputs for model validation and assessment of the batch effect correction.
MultiBaC package is available on Bioconductor (https://www.bioconductor.org/packages/release/bioc/html/MultiBaC.html) and GitHub (https://github.com/ConesaLab/MultiBaC.git). The data underlying this article are available in Gene Expression Omnibus repository (accession numbers GSE11521, GSE1002, GSE56622 and GSE43747).
Supplementary data are available at Bioinformatics online.
组学数据中的批次效应通常是技术噪声的来源,会掩盖生物学信号并阻碍数据分析。已经广泛针对各个组学技术解决了批次效应去除问题。然而,多组学数据集可能结合了在不同批次中获得的数据,其中组学类型和批次通常是混淆的。此外,在数据采集过程中可能会引入系统性偏差,从而产生隐藏的批次效应。目前的方法无法解决这些情况下的批次效应校正问题。
在本文中,我们介绍了 MultiBaC R 包,这是一种用于多组学和隐藏批次效应情况下批次效应去除的工具。该包包括各种图形输出,用于模型验证和批次效应校正评估。
MultiBaC 包可在 Bioconductor(https://www.bioconductor.org/packages/release/bioc/html/MultiBaC.html)和 GitHub(https://github.com/ConesaLab/MultiBaC.git)上使用。本文所依据的数据可在基因表达综合数据库(Gene Expression Omnibus repository)中获取(访问号 GSE11521、GSE1002、GSE56622 和 GSE43747)。
补充数据可在 Bioinformatics 在线获取。