使用改良的ComBat去除纯化浆细胞基因表达微阵列中的批次效应。

Removing batch effects from purified plasma cell gene expression microarrays with modified ComBat.

作者信息

Stein Caleb K, Qu Pingping, Epstein Joshua, Buros Amy, Rosenthal Adam, Crowley John, Morgan Gareth, Barlogie Bart

机构信息

Myeloma Institute for Research and Therapy, University of Arkansas for Medical Sciences, Little Rock, AR, USA.

Cancer Research and Biostatistics, Seattle, WA, USA.

出版信息

BMC Bioinformatics. 2015 Feb 25;16:63. doi: 10.1186/s12859-015-0478-3.

DOI:10.1186/s12859-015-0478-3

PMID:25887219

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4355992/

Abstract

BACKGROUND

Gene expression profiling (GEP) via microarray analysis is a widely used tool for assessing risk and other patient diagnostics in clinical settings. However, non-biological factors such as systematic changes in sample preparation, differences in scanners, and other potential batch effects are often unavoidable in long-term studies and meta-analysis. In order to reduce the impact of batch effects on microarray data, Johnson, Rabinovic, and Li developed ComBat for use when combining batches of gene expression microarray data. We propose a modification to ComBat that centers data to the location and scale of a pre-determined, 'gold-standard' batch. This modified ComBat (M-Combat) is designed specifically in the context of meta-analysis and batch effect adjustment for use with predictive models that are validated and fixed on historical data from a 'gold-standard' batch.

RESULTS

We combined data from MIRT across two batches ('Old' and 'New' Kit sample preparation) as well as external data sets from the HOVON-65/GMMG-HD4 and MRC-IX trials into a combined set, first without transformation and then with both ComBat and M-ComBat transformations. Fixed and validated gene risk signatures developed at MIRT on the Old Kit standard (GEP5, GEP70, and GEP80 risk scores) were compared across these combined data sets. Both ComBat and M-ComBat eliminated all of the differences among probes caused by systematic batch effects (over 98% of all untransformed probes were significantly different by ANOVA with 0.01 q-value threshold reduced to zero significant probes with ComBat and M-ComBat). The agreement in mean and distribution of risk scores, as well as the proportion of high-risk subjects identified, coincided with the 'gold-standard' batch more with M-ComBat than with ComBat. The performance of risk scores improved overall using either ComBat or M-Combat; however, using M-ComBat and the original, optimal risk cutoffs allowed for greater ability in our study to identify smaller cohorts of high-risk subjects.

CONCLUSION

M-ComBat is a practical modification to an accepted method that offers greater power to control the location and scale of batch-effect adjusted data. M-ComBat allows for historical models to function as intended on future samples despite known, often unavoidable systematic changes to gene expression data.

摘要

背景

通过微阵列分析进行基因表达谱分析（GEP）是临床环境中评估风险和进行其他患者诊断的广泛使用的工具。然而，在长期研究和荟萃分析中，诸如样本制备中的系统变化、扫描仪差异以及其他潜在批次效应等非生物学因素往往不可避免。为了减少批次效应对微阵列数据的影响，约翰逊、拉比诺维奇和李开发了ComBat，用于合并基因表达微阵列数据批次时使用。我们提出了对ComBat的一种修改，将数据集中到预先确定的“金标准”批次的位置和规模。这种改进的ComBat（M-ComBat）是专门在荟萃分析和批次效应调整的背景下设计的，用于与基于“金标准”批次的历史数据进行验证和固定的预测模型一起使用。

结果

我们将来自两个批次（“旧”和“新”试剂盒样本制备）的MIRT数据以及来自HOVON-65/GMMG-HD4和MRC-IX试验的外部数据集合并为一个组合集，首先不进行转换，然后进行ComBat和M-ComBat转换。在这些合并的数据集中，比较了在旧试剂盒标准（GEP5、GEP70和GEP80风险评分）上在MIRT开发的固定且经过验证的基因风险特征。ComBat和M-ComBat都消除了由系统批次效应引起的所有探针差异（超过98%的所有未转换探针通过ANOVA在0.01的q值阈值下显著不同，使用ComBat和M-ComBat后降至零显著探针）。风险评分的均值和分布的一致性，以及识别出的高风险受试者的比例，与“金标准”批次相比，M-ComBat比ComBat更吻合。使用ComBat或M-ComBat总体上风险评分的性能都有所提高；然而，在我们的研究中，使用M-ComBat和原始的最佳风险临界值能够更有能力识别较小的高风险受试者队列。

结论

M-ComBat是对一种公认方法的实际改进，它在控制批次效应调整数据的位置和规模方面具有更大的能力。M-ComBat允许历史模型在未来样本上按预期运行，尽管基因表达数据存在已知的、通常不可避免的系统变化。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/791c/4355992/24619348cfa6/12859_2015_478_Fig1_HTML.jpg

相似文献

Removing batch effects from purified plasma cell gene expression microarrays with modified ComBat.

BMC Bioinformatics. 2015 Feb 25;16:63. doi: 10.1186/s12859-015-0478-3.

Blind estimation and correction of microarray batch effect.

PLoS One. 2020 Apr 9;15(4):e0231446. doi: 10.1371/journal.pone.0231446. eCollection 2020.

Removing batch effects in analysis of expression microarray data: an evaluation of six batch adjustment methods.

PLoS One. 2011 Feb 28;6(2):e17238. doi: 10.1371/journal.pone.0017238.

Removing Batch Effects from Longitudinal Gene Expression - Quantile Normalization Plus ComBat as Best Approach for Microarray Transcriptome Data.

PLoS One. 2016 Jun 7;11(6):e0156594. doi: 10.1371/journal.pone.0156594. eCollection 2016.

Batch effect reduction of microarray data with dependent samples using an empirical Bayes approach (BRIDGE).

Stat Appl Genet Mol Biol. 2021 Dec 14;20(4-6):101-119. doi: 10.1515/sagmb-2021-0020.

Covariance adjustment for batch effect in gene expression data.

Stat Med. 2014 Jul 10;33(15):2681-95. doi: 10.1002/sim.6157. Epub 2014 Mar 28.

Methods that remove batch effects while retaining group differences may lead to exaggerated confidence in downstream analyses.

Biostatistics. 2016 Jan;17(1):29-39. doi: 10.1093/biostatistics/kxv027. Epub 2015 Aug 13.

Gene expression analysis in clear cell renal cell carcinoma using gene set enrichment analysis for biostatistical management.

BJU Int. 2011 Jul;108(2 Pt 2):E29-35. doi: 10.1111/j.1464-410X.2010.09794.x. Epub 2011 Mar 16.

Propensity scores as a novel method to guide sample allocation and minimize batch effects during the design of high throughput experiments.

BMC Bioinformatics. 2023 Mar 7;24(1):86. doi: 10.1186/s12859-023-05202-6.

Comparison of statistical methods and the use of quality control samples for batch effect correction in human transcriptome data.

PLoS One. 2018 Aug 30;13(8):e0202947. doi: 10.1371/journal.pone.0202947. eCollection 2018.

引用本文的文献

Interpretable Machine Learning for Cross-Cohort Prediction of Motor Fluctuations in Parkinson's Disease.

Mov Disord. 2025 Aug;40(8):1604-1617. doi: 10.1002/mds.30223. Epub 2025 May 14.

Multi-cohort machine learning identifies predictors of cognitive impairment in Parkinson's disease.

NPJ Digit Med. 2025 Jul 26;8(1):482. doi: 10.1038/s41746-025-01862-1.

Subversion of mRNA degradation pathways by EWSR1::FLI1 represents a therapeutic vulnerability in Ewing sarcoma.

Nat Commun. 2025 Jul 16;16(1):6537. doi: 10.1038/s41467-025-61725-x.

High Tau expression correlates with reduced invasion and prolonged survival in Ewing sarcoma.

Cell Death Discov. 2025 May 3;11(1):216. doi: 10.1038/s41420-025-02497-7.

Interaction and verification of ferroptosis-related RNAs Rela and Stat3 in promoting sepsis-associated acute kidney injury.

Open Med (Wars). 2025 Apr 2;20(1):20251156. doi: 10.1515/med-2025-1156. eCollection 2025.

Multi-omics profiling reveals key factors involved in Ewing sarcoma metastasis.

Mol Oncol. 2025 Apr;19(4):1002-1028. doi: 10.1002/1878-0261.13788. Epub 2025 Jan 5.

Genomic and phenotypic stability of fusion-driven pediatric sarcoma cell lines.

Nat Commun. 2025 Jan 3;16(1):380. doi: 10.1038/s41467-024-55340-5.

Harmonization for Parkinson's Disease Multi-Dataset T1 MRI Morphometry Classification.

NeuroSci. 2024 Nov 29;5(4):600-613. doi: 10.3390/neurosci5040042.

Evaluation of ComBat Harmonization for Reducing Across-Tracer Differences in Regional Amyloid PET Analyses.

Hum Brain Mapp. 2024 Nov;45(16):e70068. doi: 10.1002/hbm.70068.

Thinking points for effective batch correction on biomedical data.

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae515.

本文引用的文献

Five gene probes carry most of the discriminatory power of the 70-gene risk model in multiple myeloma.

Leukemia. 2014 Dec;28(12):2410-3. doi: 10.1038/leu.2014.232. Epub 2014 Jul 31.

Risk stratification in multiple myeloma, part 1: characterization of high-risk disease.

Clin Adv Hematol Oncol. 2013 Aug;11(8):489-503.

Clinical, genomic, and imaging predictors of myeloma progression from asymptomatic monoclonal gammopathies (SWOG S0120).

Blood. 2014 Jan 2;123(1):78-85. doi: 10.1182/blood-2013-07-515239. Epub 2013 Oct 21.

A gene expression signature for high-risk multiple myeloma.

Leukemia. 2012 Nov;26(11):2406-13. doi: 10.1038/leu.2012.127. Epub 2012 May 8.

Batch correction of microarray data substantially improves the identification of genes differentially expressed in rheumatoid arthritis and osteoarthritis.

BMC Med Genomics. 2012 Jun 8;5:23. doi: 10.1186/1755-8794-5-23.

The role of maintenance thalidomide therapy in multiple myeloma: MRC Myeloma IX results and meta-analysis.

Blood. 2012 Jan 5;119(1):7-15. doi: 10.1182/blood-2011-06-357038. Epub 2011 Oct 20.

Pharmacogenomics of bortezomib test-dosing identifies hyperexpression of proteasome genes, especially PSMD4, as novel high-risk feature in myeloma treated with Total Therapy 3.

Blood. 2011 Sep 29;118(13):3512-24. doi: 10.1182/blood-2010-12-328252. Epub 2011 May 31.

Integrated analysis of multiple microarray datasets identifies a reproducible survival predictor in ovarian cancer.

PLoS One. 2011 Mar 29;6(3):e18202. doi: 10.1371/journal.pone.0018202.

Removing batch effects in analysis of expression microarray data: an evaluation of six batch adjustment methods.

PLoS One. 2011 Feb 28;6(2):e17238. doi: 10.1371/journal.pone.0017238.

Tackling the widespread and critical impact of batch effects in high-throughput data.

Nat Rev Genet. 2010 Oct;11(10):733-9. doi: 10.1038/nrg2825. Epub 2010 Sep 14.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用改良的ComBat去除纯化浆细胞基因表达微阵列中的批次效应。

Removing batch effects from purified plasma cell gene expression microarrays with modified ComBat.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献