Suppr超能文献

BatMan:通过分层缓解批次效应以进行生存结局预测。

BatMan: Mitigating Batch Effects Via Stratification for Survival Outcome Prediction.

机构信息

Division of Biostatistics, College of Public Health, Ohio State University, Columbus, OH.

Department of Population Health, New York University, New York, NY.

出版信息

JCO Clin Cancer Inform. 2023 Jun;7:e2200138. doi: 10.1200/CCI.22.00138.

Abstract

Reproducible translation of transcriptomics data has been hampered by the ubiquitous presence of batch effects. Statistical methods for managing batch effects were initially developed in the setting of sample group comparison and later borrowed for other settings such as survival outcome prediction. The most notable such method is ComBat, which adjusts for batches by including it as a covariate alongside sample groups in a linear regression. In survival prediction, however, ComBat is used without definable groups for survival outcome and is done sequentially with survival regression for a potentially batch-confounded outcome. To address these issues, we propose a new method called BATch MitigAtion via stratificatioN (BatMan). It adjusts batches as strata in survival regression and uses variable selection methods such as the regularized regression to handle high dimensionality. We assess the performance of BatMan in comparison with ComBat, each used either alone or in conjunction with data normalization, in a resampling-based simulation study under various levels of predictive signal strength and patterns of batch-outcome association. Our simulations show that (1) BatMan outperforms ComBat in nearly all scenarios when there are batch effects in the data and (2) their performance can be worsened by the addition of data normalization. We further evaluate them using microRNA data for ovarian cancer from the Cancer Genome Atlas and find that BatMan outforms ComBat while the addition of data normalization worsens the prediction. Our study thus shows the advantage of BatMan and raises caution about the use of data normalization in the context of developing survival prediction models. The BatMan method and the simulation tool for performance assessment are implemented in R and publicly available at LXQin/PRECISION.survival-GitHub.

摘要

转录组数据的可重现翻译一直受到批次效应普遍存在的阻碍。用于管理批次效应的统计方法最初是在样本组比较的背景下开发的,后来被借用到其他环境中,例如生存结果预测。最著名的方法是 ComBat,它通过将批次作为协变量与样本组一起包含在线性回归中,从而调整批次。然而,在生存预测中,ComBat 没有为生存结果定义可定义的组,并且与生存回归一起顺序进行,以避免潜在的批次混淆结果。为了解决这些问题,我们提出了一种名为通过分层(BatMan)进行批次缓解的新方法。它将批次调整为生存回归中的分层,并使用变量选择方法(如正则化回归)来处理高维数据。我们在基于重采样的模拟研究中评估了 BatMan 与 ComBat 的性能,每个方法都单独使用或与数据归一化一起使用,在各种预测信号强度和批次-结果关联模式下进行。我们的模拟表明:(1)当数据中存在批次效应时,BatMan 在几乎所有情况下都优于 ComBat;(2)添加数据归一化会使它们的性能恶化。我们进一步使用癌症基因组图谱(Cancer Genome Atlas)中来自卵巢癌的 microRNA 数据评估它们,并发现 BatMan 优于 ComBat,而添加数据归一化则会降低预测效果。因此,我们的研究表明了 BatMan 的优势,并对在开发生存预测模型的背景下使用数据归一化提出了警告。BatMan 方法和性能评估的模拟工具已在 R 中实现,并在 LXQin/PRECISION.survival-GitHub 上公开提供。

相似文献

1
BatMan: Mitigating Batch Effects Via Stratification for Survival Outcome Prediction.
JCO Clin Cancer Inform. 2023 Jun;7:e2200138. doi: 10.1200/CCI.22.00138.
3
Removing batch effects from purified plasma cell gene expression microarrays with modified ComBat.
BMC Bioinformatics. 2015 Feb 25;16:63. doi: 10.1186/s12859-015-0478-3.
4
Batch normalization followed by merging is powerful for phenotype prediction integrating multiple heterogeneous studies.
PLoS Comput Biol. 2023 Oct 16;19(10):e1010608. doi: 10.1371/journal.pcbi.1010608. eCollection 2023 Oct.
5
Performance evaluation of transcriptomics data normalization for survival risk prediction.
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab257.
6
The practical effect of batch on genomic prediction.
Stat Appl Genet Mol Biol. 2012;11(3):Article 10. doi: 10.1515/1544-6115.1766.
7
On data normalization and batch-effect correction for tumor subtyping with microRNA data.
NAR Genom Bioinform. 2023 Jan 10;5(1):lqac100. doi: 10.1093/nargab/lqac100. eCollection 2023 Mar.
9
Batch effect reduction of microarray data with dependent samples using an empirical Bayes approach (BRIDGE).
Stat Appl Genet Mol Biol. 2021 Dec 14;20(4-6):101-119. doi: 10.1515/sagmb-2021-0020.
10
Comparison of statistical methods and the use of quality control samples for batch effect correction in human transcriptome data.
PLoS One. 2018 Aug 30;13(8):e0202947. doi: 10.1371/journal.pone.0202947. eCollection 2018.

本文引用的文献

1
Overcoming the impacts of two-step batch effect correction on gene expression estimation and inference.
Biostatistics. 2023 Jul 14;24(3):635-652. doi: 10.1093/biostatistics/kxab039.
2
Performance evaluation of transcriptomics data normalization for survival risk prediction.
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab257.
3
Modeling drug response using network-based personalized treatment prediction (NetPTP) with applications to inflammatory bowel disease.
PLoS Comput Biol. 2021 Feb 5;17(2):e1008631. doi: 10.1371/journal.pcbi.1008631. eCollection 2021 Feb.
5
The impact of different sources of heterogeneity on loss of accuracy from genomic prediction models.
Biostatistics. 2020 Apr 1;21(2):253-268. doi: 10.1093/biostatistics/kxy044.
7
The Cancer Genome Atlas: Creating Lasting Value beyond Its Data.
Cell. 2018 Apr 5;173(2):283-285. doi: 10.1016/j.cell.2018.03.042.
8
Issues with data and analyses: Errors, underlying themes, and potential solutions.
Proc Natl Acad Sci U S A. 2018 Mar 13;115(11):2563-2570. doi: 10.1073/pnas.1708279115.
10
Iterative sure independence screening EM-Bayesian LASSO algorithm for multi-locus genome-wide association studies.
PLoS Comput Biol. 2017 Jan 31;13(1):e1005357. doi: 10.1371/journal.pcbi.1005357. eCollection 2017 Jan.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验