• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

pyComBat,一个使用经验贝叶斯方法进行高通量分子数据批次效应校正的 Python 工具。

pyComBat, a Python tool for batch effects correction in high-throughput molecular data using empirical Bayes methods.

机构信息

Epigene Labs, Paris, France.

University of Edinburgh, Edinburgh, UK.

出版信息

BMC Bioinformatics. 2023 Dec 7;24(1):459. doi: 10.1186/s12859-023-05578-5.

DOI:10.1186/s12859-023-05578-5
PMID:38057718
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10701943/
Abstract

BACKGROUND

Variability in datasets is not only the product of biological processes: they are also the product of technical biases. ComBat and ComBat-Seq are among the most widely used tools for correcting those technical biases, called batch effects, in, respectively, microarray and RNA-Seq expression data.

RESULTS

In this technical note, we present a new Python implementation of ComBat and ComBat-Seq. While the mathematical framework is strictly the same, we show here that our implementations: (i) have similar results in terms of batch effects correction; (ii) are as fast or faster than the original implementations in R and; (iii) offer new tools for the bioinformatics community to participate in its development. pyComBat is implemented in the Python language and is distributed under GPL-3.0 ( https://www.gnu.org/licenses/gpl-3.0.en.html ) license as a module of the inmoose package. Source code is available at https://github.com/epigenelabs/inmoose and Python package at https://pypi.org/project/inmoose .

CONCLUSIONS

We present a new Python implementation of state-of-the-art tools ComBat and ComBat-Seq for the correction of batch effects in microarray and RNA-Seq data. This new implementation, based on the same mathematical frameworks as ComBat and ComBat-Seq, offers similar power for batch effect correction, at reduced computational cost.

摘要

背景

数据集的变异性不仅是生物过程的产物:它们也是技术偏差的产物。ComBat 和 ComBat-Seq 是最广泛用于纠正微阵列和 RNA-Seq 表达数据中所谓批次效应的技术偏差的工具之一。

结果

在本技术说明中,我们提出了 ComBat 和 ComBat-Seq 的新 Python 实现。虽然数学框架完全相同,但我们在这里表明,我们的实现:(i)在批次效应校正方面具有相似的结果;(ii)在速度上与 R 中的原始实现一样快或更快;(iii)为生物信息学社区提供了新的工具来参与其开发。pyComBat 是用 Python 语言实现的,并作为 inmoose 包的一个模块以 GPL-3.0(https://www.gnu.org/licenses/gpl-3.0.en.html)许可证分发。源代码可在 https://github.com/epigenelabs/inmoose 获得,Python 包可在 https://pypi.org/project/inmoose 获得。

结论

我们提出了 ComBat 和 ComBat-Seq 的新的 Python 实现,用于纠正微阵列和 RNA-Seq 数据中的批次效应。这个新的实现基于与 ComBat 和 ComBat-Seq 相同的数学框架,在降低计算成本的同时提供了类似的批次效应校正能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4fa/10701943/0bec9c3246e3/12859_2023_5578_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4fa/10701943/82985ca597fd/12859_2023_5578_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4fa/10701943/99dfcaf4fef6/12859_2023_5578_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4fa/10701943/0bec9c3246e3/12859_2023_5578_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4fa/10701943/82985ca597fd/12859_2023_5578_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4fa/10701943/99dfcaf4fef6/12859_2023_5578_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4fa/10701943/0bec9c3246e3/12859_2023_5578_Fig3_HTML.jpg

相似文献

1
pyComBat, a Python tool for batch effects correction in high-throughput molecular data using empirical Bayes methods.pyComBat,一个使用经验贝叶斯方法进行高通量分子数据批次效应校正的 Python 工具。
BMC Bioinformatics. 2023 Dec 7;24(1):459. doi: 10.1186/s12859-023-05578-5.
2
bayNorm: Bayesian gene expression recovery, imputation and normalization for single-cell RNA-sequencing data.bayNorm:用于单细胞 RNA-seq 数据的贝叶斯基因表达恢复、插补和标准化。
Bioinformatics. 2020 Feb 15;36(4):1174-1181. doi: 10.1093/bioinformatics/btz726.
3
pyrpipe: a Python package for RNA-Seq workflows.pyrpipe:一个用于RNA测序工作流程的Python软件包。
NAR Genom Bioinform. 2021 Jun 1;3(2):lqab049. doi: 10.1093/nargab/lqab049. eCollection 2021 Jun.
4
Scedar: A scalable Python package for single-cell RNA-seq exploratory data analysis.Scedar:一个用于单细胞 RNA-seq 探索性数据分析的可扩展 Python 包。
PLoS Comput Biol. 2020 Apr 27;16(4):e1007794. doi: 10.1371/journal.pcbi.1007794. eCollection 2020 Apr.
5
GReNaDIne: A Data-Driven Python Library to Infer Gene Regulatory Networks from Gene Expression Data.GReNaDIne:一个基于数据驱动的 Python 库,用于从基因表达数据中推断基因调控网络。
Genes (Basel). 2023 Jan 20;14(2):269. doi: 10.3390/genes14020269.
6
MultiBaC: an R package to remove batch effects in multi-omic experiments.MultiBaC:一个用于去除多组学实验中批次效应的 R 包。
Bioinformatics. 2022 Apr 28;38(9):2657-2658. doi: 10.1093/bioinformatics/btac132.
7
PyDESeq2: a python package for bulk RNA-seq differential expression analysis.PyDESeq2:一个用于批量 RNA-seq 差异表达分析的 Python 包。
Bioinformatics. 2023 Sep 2;39(9). doi: 10.1093/bioinformatics/btad547.
8
NeuroPycon: An open-source python toolbox for fast multi-modal and reproducible brain connectivity pipelines.NeuroPycon:一个开源的 Python 工具包,用于快速进行多模态和可重复的脑连接管道。
Neuroimage. 2020 Oct 1;219:117020. doi: 10.1016/j.neuroimage.2020.117020. Epub 2020 Jun 6.
9
Analysing high-throughput sequencing data in Python with HTSeq 2.0.用 HTSeq 2.0 分析 Python 中的高通量测序数据。
Bioinformatics. 2022 May 13;38(10):2943-2945. doi: 10.1093/bioinformatics/btac166.
10
PyIOmica: longitudinal omics analysis and trend identification.PyIOmica:纵向组学分析和趋势识别。
Bioinformatics. 2020 Apr 1;36(7):2306-2307. doi: 10.1093/bioinformatics/btz896.

引用本文的文献

1
Epigenomic diagnosis and prognosis of Acute Myeloid Leukemia.急性髓系白血病的表观基因组诊断与预后
Nat Commun. 2025 Jul 29;16(1):6961. doi: 10.1038/s41467-025-62005-4.
2
N'-(1-phenylethylidene)-benzohydrazide cytotoxicity is LSD1 independent and linked to Fe-S cluster disruption in Ewing sarcoma.N'-(1-苯基亚乙基)-苯甲酰肼的细胞毒性不依赖赖氨酸特异性去甲基化酶1(LSD1),且与尤因肉瘤中的铁硫簇破坏有关。
bioRxiv. 2025 Jun 26:2025.06.20.660795. doi: 10.1101/2025.06.20.660795.
3
CMV reshapes lymphoid immunity in aging: a single-cell atlas with predictive modeling.

本文引用的文献

1
Sustainable data analysis with Snakemake.使用 Snakemake 进行可持续数据分析。
F1000Res. 2021 Jan 18;10:33. doi: 10.12688/f1000research.29032.2. eCollection 2021.
2
Genetic Analysis of Multiple Myeloma Identifies Cytogenetic Alterations Implicated in Disease Complexity and Progression.多发性骨髓瘤的基因分析确定了与疾病复杂性和进展相关的细胞遗传学改变。
Cancers (Basel). 2021 Jan 29;13(3):517. doi: 10.3390/cancers13030517.
3
: batch effect adjustment for RNA-seq count data.RNA测序计数数据的批次效应调整
巨细胞病毒重塑衰老过程中的淋巴免疫:一个具有预测模型的单细胞图谱
bioRxiv. 2025 Jun 27:2025.06.24.661167. doi: 10.1101/2025.06.24.661167.
4
Triggering AHR resolves TGF-β1 induced fibroblast activation and promotes AT1 cell regeneration in alveolar organoids.引发气道高反应性可解决转化生长因子-β1诱导的成纤维细胞活化,并促进肺泡类器官中AT1细胞的再生。
Commun Biol. 2025 Jul 9;8(1):1025. doi: 10.1038/s42003-025-08446-5.
5
Radiomic Model Associated with Tumor Microenvironment Predicts Immunotherapy Response and Prognosis in Patients with Locoregionally Advanced Nasopharyngeal Carcinoma.与肿瘤微环境相关的放射组学模型预测局部晚期鼻咽癌患者的免疫治疗反应和预后
Research (Wash D C). 2025 Jun 24;8:0749. doi: 10.34133/research.0749. eCollection 2025.
6
18F-FDG PET/CT Radiomics for Predicting Therapy Response in Primary Mediastinal B-Cell Lymphoma: A Bi-Centric Pilot Study.18F-FDG PET/CT影像组学用于预测原发性纵隔B细胞淋巴瘤的治疗反应:一项双中心前瞻性研究
Cancers (Basel). 2025 May 30;17(11):1827. doi: 10.3390/cancers17111827.
7
Uncovering codon usage patterns during murine embryogenesis and tissue-specific developmental diseases.揭示小鼠胚胎发育和组织特异性发育疾病过程中的密码子使用模式。
Front Genet. 2025 May 26;16:1554773. doi: 10.3389/fgene.2025.1554773. eCollection 2025.
8
Bridging the gap between R and Python in bulk transcriptomic data analysis with InMoose.使用InMoose弥合R和Python在批量转录组数据分析中的差距。
Sci Rep. 2025 May 24;15(1):18104. doi: 10.1038/s41598-025-03376-y.
9
CellHit: a web server to predict and analyze cancer patients' drug responsiveness.CellHit:一个用于预测和分析癌症患者药物反应性的网络服务器。
Nucleic Acids Res. 2025 Jul 7;53(W1):W143-W150. doi: 10.1093/nar/gkaf414.
10
Advancing atmospheric solids analysis probe mass spectrometry applications: a multifaceted approach to optimising clinical data set generation.推进大气固体分析探针质谱应用:一种优化临床数据集生成的多方面方法。
Analyst. 2025 May 15. doi: 10.1039/d5an00166h.
NAR Genom Bioinform. 2020 Sep;2(3):lqaa078. doi: 10.1093/nargab/lqaa078. Epub 2020 Sep 21.
4
Fast, sensitive and accurate integration of single-cell data with Harmony.利用 Harmony 实现单细胞数据的快速、灵敏和精确整合。
Nat Methods. 2019 Dec;16(12):1289-1296. doi: 10.1038/s41592-019-0619-0. Epub 2019 Nov 18.
5
Pathway activity profiling of growth factor receptor network and stemness pathways differentiates metaplastic breast cancer histological subtypes.生长因子受体网络和干性通路的途径活性分析可区分癌肉瘤样乳腺癌的组织学亚型。
BMC Cancer. 2019 Sep 5;19(1):881. doi: 10.1186/s12885-019-6052-z.
6
Proteogenomic Analysis of Human Colon Cancer Reveals New Therapeutic Opportunities.人类结肠癌的蛋白质基因组分析揭示了新的治疗机会。
Cell. 2019 May 2;177(4):1035-1049.e19. doi: 10.1016/j.cell.2019.03.030. Epub 2019 Apr 25.
7
Machine learning predicts individual cancer patient responses to therapeutic drugs with high accuracy.机器学习可以高精度地预测个体癌症患者对治疗药物的反应。
Sci Rep. 2018 Nov 6;8(1):16444. doi: 10.1038/s41598-018-34753-5.
8
SCANPY: large-scale single-cell gene expression data analysis.SCANPY:大规模单细胞基因表达数据分析。
Genome Biol. 2018 Feb 6;19(1):15. doi: 10.1186/s13059-017-1382-0.
9
Activity of distinct growth factor receptor network components in breast tumors uncovers two biologically relevant subtypes.乳腺肿瘤中不同生长因子受体网络成分的活性揭示了两种生物学相关亚型。
Genome Med. 2017 Apr 26;9(1):40. doi: 10.1186/s13073-017-0429-x.
10
Nextflow enables reproducible computational workflows.Nextflow支持可重复的计算工作流程。
Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820.