• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

克服两步批处理效应校正对基因表达估计和推断的影响。

Overcoming the impacts of two-step batch effect correction on gene expression estimation and inference.

机构信息

Academy of Pharmacy, Xi'an Jiaotong-Liverpool University, 111 Ren'ai Road, Dushu Lake Higher Education Town, Suzhou Industrial Park, Suzhou 215123, Jiangsu Province, PRC.

Clinical Bioinformatics, Gilead Sciences, Inc., 333 Lakeside Dr, Foster City, CA 94404.

出版信息

Biostatistics. 2023 Jul 14;24(3):635-652. doi: 10.1093/biostatistics/kxab039.

DOI:10.1093/biostatistics/kxab039
PMID:34893807
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10449015/
Abstract

Nonignorable technical variation is commonly observed across data from multiple experimental runs, platforms, or studies. These so-called batch effects can lead to difficulty in merging data from multiple sources, as they can severely bias the outcome of the analysis. Many groups have developed approaches for removing batch effects from data, usually by accommodating batch variables into the analysis (one-step correction) or by preprocessing the data prior to the formal or final analysis (two-step correction). One-step correction is often desirable due it its simplicity, but its flexibility is limited and it can be difficult to include batch variables uniformly when an analysis has multiple stages. Two-step correction allows for richer models of batch mean and variance. However, prior investigation has indicated that two-step correction can lead to incorrect statistical inference in downstream analysis. Generally speaking, two-step approaches introduce a correlation structure in the corrected data, which, if ignored, may lead to either exaggerated or diminished significance in downstream applications such as differential expression analysis. Here, we provide more intuitive and more formal evaluations of the impacts of two-step batch correction compared to existing literature. We demonstrate that the undesired impacts of two-step correction (exaggerated or diminished significance) depend on both the nature of the study design and the batch effects. We also provide strategies for overcoming these negative impacts in downstream analyses using the estimated correlation matrix of the corrected data. We compare the results of our proposed workflow with the results from other published one-step and two-step methods and show that our methods lead to more consistent false discovery controls and power of detection across a variety of batch effect scenarios. Software for our method is available through GitHub (https://github.com/jtleek/sva-devel) and will be available in future versions of the $\texttt{sva}$ R package in the Bioconductor project (https://bioconductor.org/packages/release/bioc/html/sva.html).

摘要

非可忽略的技术变异在来自多个实验运行、平台或研究的数据中通常是观察到的。这些所谓的批次效应可能导致难以合并来自多个来源的数据,因为它们会严重偏向分析的结果。许多研究小组已经开发了从数据中去除批次效应的方法,通常是通过将批次变量纳入分析(一步校正)或在正式或最终分析之前对数据进行预处理(两步校正)。由于其简单性,一步校正通常是可取的,但它的灵活性有限,并且在分析具有多个阶段时,很难统一包含批次变量。两步校正允许更丰富的批次均值和方差模型。然而,先前的研究表明,两步校正可能导致下游分析中的不正确统计推断。一般来说,两步方法在校正后的数据中引入了相关结构,如果忽略了这种结构,可能会导致下游应用(如差异表达分析)中的显著性被夸大或减弱。在这里,我们提供了比现有文献更直观和更正式的两步批量校正影响的评估。我们证明了两步校正的不良影响(夸大或减弱显著性)取决于研究设计的性质和批次效应。我们还提供了在下游分析中使用校正后数据的估计相关矩阵克服这些负面影响的策略。我们比较了我们提出的工作流程的结果与其他已发表的一步和两步方法的结果,并表明我们的方法在各种批次效应情况下导致更一致的错误发现控制和检测能力。我们的方法的软件可通过 GitHub(https://github.com/jtleek/sva-devel)获得,并将在 Bioconductor 项目中的未来版本的 $\texttt{sva}$ R 包中提供(https://bioconductor.org/packages/release/bioc/html/sva.html)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba87/10449015/454c13a44b2b/kxab039f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba87/10449015/0d7482e30759/kxab039f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba87/10449015/793f7ff9ccfd/kxab039f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba87/10449015/a2da583ff278/kxab039f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba87/10449015/15df3860362e/kxab039f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba87/10449015/0ea9fe3a2112/kxab039f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba87/10449015/454c13a44b2b/kxab039f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba87/10449015/0d7482e30759/kxab039f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba87/10449015/793f7ff9ccfd/kxab039f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba87/10449015/a2da583ff278/kxab039f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba87/10449015/15df3860362e/kxab039f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba87/10449015/0ea9fe3a2112/kxab039f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba87/10449015/454c13a44b2b/kxab039f6.jpg

相似文献

1
Overcoming the impacts of two-step batch effect correction on gene expression estimation and inference.克服两步批处理效应校正对基因表达估计和推断的影响。
Biostatistics. 2023 Jul 14;24(3):635-652. doi: 10.1093/biostatistics/kxab039.
2
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
3
Sexual Harassment and Prevention Training性骚扰与预防培训
4
Short-Term Memory Impairment短期记忆障碍
5
[Volume and health outcomes: evidence from systematic reviews and from evaluation of Italian hospital data].[容量与健康结果:来自系统评价和意大利医院数据评估的证据]
Epidemiol Prev. 2013 Mar-Jun;37(2-3 Suppl 2):1-100.
6
Immunogenicity and seroefficacy of pneumococcal conjugate vaccines: a systematic review and network meta-analysis.肺炎球菌结合疫苗的免疫原性和血清效力:系统评价和网络荟萃分析。
Health Technol Assess. 2024 Jul;28(34):1-109. doi: 10.3310/YWHA3079.
7
Home treatment for mental health problems: a systematic review.心理健康问题的居家治疗:一项系统综述
Health Technol Assess. 2001;5(15):1-139. doi: 10.3310/hta5150.
8
The Black Book of Psychotropic Dosing and Monitoring.《精神药物剂量与监测黑皮书》
Psychopharmacol Bull. 2024 Jul 8;54(3):8-59.
9
AI-based Hepatic Steatosis Detection and Integrated Hepatic Assessment from Cardiac CT Attenuation Scans Enhances All-cause Mortality Risk Stratification: A Multi-center Study.基于人工智能的心脏CT衰减扫描检测肝脂肪变性及综合肝脏评估可增强全因死亡风险分层:一项多中心研究
medRxiv. 2025 Jun 11:2025.06.09.25329157. doi: 10.1101/2025.06.09.25329157.
10
Assessing the comparative effects of interventions in COPD: a tutorial on network meta-analysis for clinicians.评估慢性阻塞性肺疾病干预措施的比较效果:面向临床医生的网状Meta分析教程
Respir Res. 2024 Dec 21;25(1):438. doi: 10.1186/s12931-024-03056-x.

引用本文的文献

1
Bioinformatics analysis of glycolysis-related differentially expressed genes (GRDEGs) and their significance in ischemic stroke.糖酵解相关差异表达基因(GRDEGs)的生物信息学分析及其在缺血性卒中中的意义
Medicine (Baltimore). 2025 Jul 18;104(29):e43402. doi: 10.1097/MD.0000000000043402.
2
ComBatLS: A Location- and Scale-Preserving Method for Multi-Site Image Harmonization.ComBatLS:一种用于多站点图像协调的位置和尺度保持方法。
Hum Brain Mapp. 2025 Jun 1;46(8):e70197. doi: 10.1002/hbm.70197.
3
Composite quantile regression approach to batch effect correction in microbiome data.

本文引用的文献

1
: batch effect adjustment for RNA-seq count data.RNA测序计数数据的批次效应调整
NAR Genom Bioinform. 2020 Sep;2(3):lqaa078. doi: 10.1093/nargab/lqaa078. Epub 2020 Sep 21.
2
Alternative empirical Bayes models for adjusting for batch effects in genomic studies.用于调整基因组研究中批次效应的替代经验贝叶斯模型。
BMC Bioinformatics. 2018 Jul 13;19(1):262. doi: 10.1186/s12859-018-2263-6.
3
Four-Gene Pan-African Blood Signature Predicts Progression to Tuberculosis.四基因泛非血液特征预测结核病进展
微生物组数据中批次效应校正的复合分位数回归方法
Front Microbiol. 2025 Feb 25;16:1484183. doi: 10.3389/fmicb.2025.1484183. eCollection 2025.
4
Comparing preprocessing strategies for 3D-Gene microarray data of extracellular vesicle-derived miRNAs.比较细胞外囊泡衍生 miRNA 的 3D-Gene 微阵列数据的预处理策略。
BMC Bioinformatics. 2024 Jun 20;25(1):221. doi: 10.1186/s12859-024-05840-4.
5
APPLICATION OF QUANTILE DISCRETIZATION AND BAYESIAN NETWORK ANALYSIS TO PUBLICLY AVAILABLE CYSTIC FIBROSIS DATA SETS.定分位数离散化和贝叶斯网络分析在公开的囊性纤维化数据集上的应用。
Pac Symp Biocomput. 2024;29:534-548.
6
Effect of data harmonization of multicentric dataset in ASD/TD classification.多中心数据集数据整合在自闭症谱系障碍/典型发育分类中的作用。
Brain Inform. 2023 Nov 25;10(1):32. doi: 10.1186/s40708-023-00210-x.
7
BatMan: Mitigating Batch Effects Via Stratification for Survival Outcome Prediction.BatMan:通过分层缓解批次效应以进行生存结局预测。
JCO Clin Cancer Inform. 2023 Jun;7:e2200138. doi: 10.1200/CCI.22.00138.
8
Image harmonization: A review of statistical and deep learning methods for removing batch effects and evaluation metrics for effective harmonization.图像调和:去除批次效应的统计和深度学习方法综述,以及有效调和的评价指标。
Neuroimage. 2023 Jul 1;274:120125. doi: 10.1016/j.neuroimage.2023.120125. Epub 2023 Apr 20.
9
Confounding factors in profiling of locus-specific human endogenous retrovirus (HERV) transcript signatures in primary T cells using multi-study-derived datasets.使用多研究衍生数据集对原发性 T 细胞中的特定基因座人类内源性逆转录病毒 (HERV) 转录特征进行分析时的混杂因素。
BMC Med Genomics. 2023 Apr 3;16(1):68. doi: 10.1186/s12920-023-01486-y.
10
Principles of phosphoproteomics and applications in cancer research.磷酸化蛋白质组学原理及其在癌症研究中的应用。
Biochem J. 2023 Mar 31;480(6):403-420. doi: 10.1042/BCJ20220220.
Am J Respir Crit Care Med. 2018 May 1;197(9):1198-1208. doi: 10.1164/rccm.201711-2340OC.
4
Existing blood transcriptional classifiers accurately discriminate active tuberculosis from latent infection in individuals from south India.现有的血液转录分类器能够准确地区分印度南部个体中的活动性肺结核和潜伏性感染。
Tuberculosis (Edinb). 2018 Mar;109:41-51. doi: 10.1016/j.tube.2018.01.002. Epub 2018 Jan 31.
5
Reply to Towfic and others' letter to the editor.对托菲克等人致编辑信件的回复。
Biostatistics. 2017 Jul 1;18(3):586-587. doi: 10.1093/biostatistics/kxx001.
6
Letter to the Editor response: Nygaard et al.致编辑的回信:尼加德等人
Biostatistics. 2017 Apr 1;18(2):197-199. doi: 10.1093/biostatistics/kxw031.
7
BatchQC: interactive software for evaluating sample and batch effects in genomic data.BatchQC:用于评估基因组数据中样本和批次效应的交互式软件。
Bioinformatics. 2016 Dec 15;32(24):3836-3838. doi: 10.1093/bioinformatics/btw538. Epub 2016 Aug 18.
8
A blood RNA signature for tuberculosis disease risk: a prospective cohort study.一种用于结核病患病风险的血液RNA特征:一项前瞻性队列研究。
Lancet. 2016 Jun 4;387(10035):2312-2322. doi: 10.1016/S0140-6736(15)01316-1. Epub 2016 Mar 24.
9
Methods that remove batch effects while retaining group differences may lead to exaggerated confidence in downstream analyses.在保留组间差异的同时消除批次效应的方法可能会导致对下游分析的信心过度膨胀。
Biostatistics. 2016 Jan;17(1):29-39. doi: 10.1093/biostatistics/kxv027. Epub 2015 Aug 13.
10
Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.使用DESeq2对RNA测序数据的倍数变化和离散度进行适度估计。
Genome Biol. 2014;15(12):550. doi: 10.1186/s13059-014-0550-8.