Suppr超能文献

reComBat:大规模多源基因表达数据整合中的批效应去除

reComBat: batch-effect removal in large-scale multi-source gene-expression data integration.

作者信息

Adamer Michael F, Brüningk Sarah C, Tejada-Arranz Alejandro, Estermann Fabienne, Basler Marek, Borgwardt Karsten

机构信息

Department of Biosystems Science and Engineering, ETH Zurich, Basel 4058, Switzerland.

Swiss Institute for Bioinformatics (SIB), Lausanne 1015, Switzerland.

出版信息

Bioinform Adv. 2022 Oct 6;2(1):vbac071. doi: 10.1093/bioadv/vbac071. eCollection 2022.

Abstract

MOTIVATION

With the steadily increasing abundance of omics data produced all over the world under vastly different experimental conditions residing in public databases, a crucial step in many data-driven bioinformatics applications is that of data integration. The challenge of batch-effect removal for entire databases lies in the large number of batches and biological variation, which can result in design matrix singularity. This problem can currently not be solved satisfactorily by any common batch-correction algorithm.

RESULTS

We present , a regularized version of the empirical Bayes method to overcome this limitation and benchmark it against popular approaches for the harmonization of public gene-expression data (both microarray and bulkRNAsq) of the human opportunistic pathogen . Batch-effects are successfully mitigated while biologically meaningful gene-expression variation is retained. fills the gap in batch-correction approaches applicable to large-scale, public omics databases and opens up new avenues for data-driven analysis of complex biological processes beyond the scope of a single study.

AVAILABILITY AND IMPLEMENTATION

The code is available at https://github.com/BorgwardtLab/reComBat, all data and evaluation code can be found at https://github.com/BorgwardtLab/batchCorrectionPublicData.

SUPPLEMENTARY INFORMATION

Supplementary data are available at online.

摘要

动机

随着世界各地在截然不同的实验条件下产生的组学数据在公共数据库中的数量稳步增加,许多数据驱动的生物信息学应用中的关键步骤是数据整合。消除整个数据库的批次效应面临的挑战在于批次数量众多以及生物变异,这可能导致设计矩阵奇异。目前,任何常见的批次校正算法都无法令人满意地解决这个问题。

结果

我们提出了一种经验贝叶斯方法的正则化版本,以克服这一限制,并将其与用于协调人类机会性病原体公共基因表达数据(微阵列和批量RNA测序)的流行方法进行基准测试。在保留具有生物学意义的基因表达变异的同时,成功减轻了批次效应。填补了适用于大规模公共组学数据库的批次校正方法的空白,并为超出单一研究范围的复杂生物过程的数据驱动分析开辟了新途径。

可用性和实现

代码可在https://github.com/BorgwardtLab/reComBat获得,所有数据和评估代码可在https://github.com/BorgwardtLab/batchCorrectionPublicData找到。

补充信息

补充数据可在网上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cafd/9710604/609f388a9c4f/vbac071f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验