• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于基因组学研究中诊断、量化和校正批次效应的新统计方法。

A Novel Statistical Method to Diagnose, Quantify and Correct Batch Effects in Genomic Studies.

机构信息

Division of Molecular Pathology, The Institute of Cancer Research, London, United Kingdom.

Centre for Molecular Pathology, Royal Marsden Hospital, London, United Kingdom.

出版信息

Sci Rep. 2017 Sep 7;7(1):10849. doi: 10.1038/s41598-017-11110-6.

DOI:10.1038/s41598-017-11110-6
PMID:28883548
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5589920/
Abstract

Genome projects now generate large-scale data often produced at various time points by different laboratories using multiple platforms. This increases the potential for batch effects. Currently there are several batch evaluation methods like principal component analysis (PCA; mostly based on visual inspection), and sometimes they fail to reveal all of the underlying batch effects. These methods can also lead to the risk of unintentionally correcting biologically interesting factors attributed to batch effects. Here we propose a novel statistical method, finding batch effect (findBATCH), to evaluate batch effect based on probabilistic principal component and covariates analysis (PPCCA). The same framework also provides a new approach to batch correction, correcting batch effect (correctBATCH), which we have shown to be a better approach to traditional PCA-based correction. We demonstrate the utility of these methods using two different examples (breast and colorectal cancers) by merging gene expression data from different studies after diagnosing and correcting for batch effects and retaining the biological effects. These methods, along with conventional visual inspection-based PCA, are available as a part of an R package exploring batch effect (exploBATCH; https://github.com/syspremed/exploBATCH ).

摘要

基因组项目现在生成大规模数据,这些数据通常由不同实验室在不同时间点使用多种平台产生。这增加了批次效应的可能性。目前有几种批次评估方法,如主成分分析(PCA;主要基于视觉检查),但有时它们无法揭示所有潜在的批次效应。这些方法还可能导致无意中纠正归因于批次效应的生物学上有趣的因素的风险。在这里,我们提出了一种新的统计方法,基于概率主成分和协变量分析(PPCCA)的发现批次效应(findBATCH)来评估批次效应。相同的框架还为批次校正提供了一种新方法,即校正批次效应(correctBATCH),我们已经证明这种方法比传统的基于 PCA 的校正方法更好。我们通过合并来自不同研究的基因表达数据,在诊断和校正批次效应并保留生物学效应后,使用两个不同的示例(乳腺癌和结直肠癌)展示了这些方法的实用性。这些方法与基于传统视觉检查的 PCA 一起,可作为探索批次效应(exploBATCH;https://github.com/syspremed/exploBATCH)的 R 包的一部分。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f047/5589920/a59793e0cfb6/41598_2017_11110_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f047/5589920/40a38e317528/41598_2017_11110_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f047/5589920/b76ebc54987c/41598_2017_11110_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f047/5589920/1cdf1d6e18bc/41598_2017_11110_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f047/5589920/1eb8f0722c4b/41598_2017_11110_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f047/5589920/6d94fbb2909e/41598_2017_11110_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f047/5589920/a59793e0cfb6/41598_2017_11110_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f047/5589920/40a38e317528/41598_2017_11110_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f047/5589920/b76ebc54987c/41598_2017_11110_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f047/5589920/1cdf1d6e18bc/41598_2017_11110_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f047/5589920/1eb8f0722c4b/41598_2017_11110_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f047/5589920/6d94fbb2909e/41598_2017_11110_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f047/5589920/a59793e0cfb6/41598_2017_11110_Fig6_HTML.jpg

相似文献

1
A Novel Statistical Method to Diagnose, Quantify and Correct Batch Effects in Genomic Studies.一种用于基因组学研究中诊断、量化和校正批次效应的新统计方法。
Sci Rep. 2017 Sep 7;7(1):10849. doi: 10.1038/s41598-017-11110-6.
2
A new statistic for identifying batch effects in high-throughput genomic data that uses guided principal component analysis.一种使用引导主成分分析识别高通量基因组数据批次效应的新统计方法。
Bioinformatics. 2013 Nov 15;29(22):2877-83. doi: 10.1093/bioinformatics/btt480. Epub 2013 Aug 19.
3
Risk-conscious correction of batch effects: maximising information extraction from high-throughput genomic datasets.基于风险意识的批次效应校正:从高通量基因组数据集中最大化信息提取。
BMC Bioinformatics. 2016 Sep 1;17(1):332. doi: 10.1186/s12859-016-1212-5.
4
Preserving biological heterogeneity with a permuted surrogate variable analysis for genomics batch correction.通过置换替代变量分析进行基因组批次校正以保留生物异质性。
Bioinformatics. 2014 Oct;30(19):2757-63. doi: 10.1093/bioinformatics/btu375. Epub 2014 Jun 6.
5
Batch correction evaluation framework using a-priori gene-gene associations: applied to the GTEx dataset.基于先验基因-基因关联的批量校正评估框架:在 GTEx 数据集上的应用。
BMC Bioinformatics. 2019 May 28;20(1):268. doi: 10.1186/s12859-019-2855-9.
6
Overcoming the impacts of two-step batch effect correction on gene expression estimation and inference.克服两步批处理效应校正对基因表达估计和推断的影响。
Biostatistics. 2023 Jul 14;24(3):635-652. doi: 10.1093/biostatistics/kxab039.
7
Robustifying genomic classifiers to batch effects via ensemble learning.通过集成学习使基因组分类器稳健化以应对批次效应。
Bioinformatics. 2021 Jul 12;37(11):1521-1527. doi: 10.1093/bioinformatics/btaa986.
8
PCA-Plus: Enhanced principal component analysis with illustrative applications to batch effects and their quantitation.PCA-Plus:增强主成分分析及其在批次效应及其定量分析中的示例应用
bioRxiv. 2024 Jan 3:2024.01.02.573793. doi: 10.1101/2024.01.02.573793.
9
BEENE: deep learning-based nonlinear embedding improves batch effect estimation.比恩:基于深度学习的非线性嵌入可改善批处理效应估计。
Bioinformatics. 2023 Aug 1;39(8). doi: 10.1093/bioinformatics/btad479.
10
Influence of batch effect correction methods on drug induced differential gene expression profiles.批处理效应校正方法对药物诱导的差异基因表达谱的影响。
BMC Bioinformatics. 2019 Aug 22;20(1):437. doi: 10.1186/s12859-019-3028-6.

引用本文的文献

1
Comprehensive multi-omics and machine learning framework for glioma subtyping and precision therapeutics.用于胶质瘤亚型分类和精准治疗的综合多组学与机器学习框架。
Sci Rep. 2025 Jul 10;15(1):24874. doi: 10.1038/s41598-025-09742-0.
2
Confirmation of HLA-II associations with TB susceptibility in admixed African samples.在非洲混合样本中证实HLA-II与结核病易感性的关联。
Elife. 2025 Jun 3;13:RP99200. doi: 10.7554/eLife.99200.
3
Gut microbiome and plasma metabolome alterations in ileostomy and after closure of ileostomy.回肠造口术及回肠造口关闭术后肠道微生物群和血浆代谢组的改变

本文引用的文献

1
Combining location-and-scale batch effect adjustment with data cleaning by latent factor adjustment.通过潜在因素调整将位置和尺度批次效应调整与数据清理相结合。
BMC Bioinformatics. 2016 Jan 12;17:27. doi: 10.1186/s12859-015-0870-z.
2
The consensus molecular subtypes of colorectal cancer.结直肠癌的共识分子亚型
Nat Med. 2015 Nov;21(11):1350-6. doi: 10.1038/nm.3967. Epub 2015 Oct 12.
3
A Cross-Species Analysis in Pancreatic Neuroendocrine Tumors Reveals Molecular Subtypes with Distinctive Clinical, Metastatic, Developmental, and Metabolic Characteristics.
Microbiol Spectr. 2025 Apr;13(4):e0119124. doi: 10.1128/spectrum.01191-24. Epub 2025 Mar 4.
4
Glycolytic pathways: The hidden regulators in Parkinson's disease.糖酵解途径:帕金森病中的隐藏调节因子。
Heliyon. 2025 Jan 17;11(3):e41831. doi: 10.1016/j.heliyon.2025.e41831. eCollection 2025 Feb 15.
5
Exploring common pathogenic association between Epstein Barr virus infection and long-COVID by integrating RNA-Seq and molecular dynamics simulations.通过整合 RNA-Seq 和分子动力学模拟,探索 Epstein Barr 病毒感染与长新冠之间的常见致病关联。
Front Immunol. 2024 Sep 26;15:1435170. doi: 10.3389/fimmu.2024.1435170. eCollection 2024.
6
BatchFLEX: feature-level equalization of X-batch.BatchFLEX:X 批的特征级均衡。
Bioinformatics. 2024 Oct 1;40(10). doi: 10.1093/bioinformatics/btae587.
7
Scoping Review: Methods and Applications of Spatial Transcriptomics in Tumor Research.综述:空间转录组学在肿瘤研究中的方法与应用
Cancers (Basel). 2024 Sep 6;16(17):3100. doi: 10.3390/cancers16173100.
8
Comparing preprocessing strategies for 3D-Gene microarray data of extracellular vesicle-derived miRNAs.比较细胞外囊泡衍生 miRNA 的 3D-Gene 微阵列数据的预处理策略。
BMC Bioinformatics. 2024 Jun 20;25(1):221. doi: 10.1186/s12859-024-05840-4.
9
Particle uptake in cancer cells can predict malignancy and drug resistance using machine learning.利用机器学习预测癌细胞中的颗粒摄取可预测恶性肿瘤和耐药性。
Sci Adv. 2024 May 31;10(22):eadj4370. doi: 10.1126/sciadv.adj4370. Epub 2024 May 29.
10
Pyroptosis is involved in the immune microenvironment regulation of unexplained recurrent miscarriage.细胞焦亡参与不明原因复发性流产的免疫微环境调控。
Mamm Genome. 2024 Jun;35(2):256-279. doi: 10.1007/s00335-024-10038-3. Epub 2024 Mar 27.
胰腺神经内分泌肿瘤的跨物种分析揭示了具有独特临床、转移、发育和代谢特征的分子亚型。
Cancer Discov. 2015 Dec;5(12):1296-313. doi: 10.1158/2159-8290.CD-15-0068. Epub 2015 Oct 7.
4
Covariance adjustment for batch effect in gene expression data.基因表达数据中批次效应的协方差调整
Stat Med. 2014 Jul 10;33(15):2681-95. doi: 10.1002/sim.6157. Epub 2014 Mar 28.
5
A new statistic for identifying batch effects in high-throughput genomic data that uses guided principal component analysis.一种使用引导主成分分析识别高通量基因组数据批次效应的新统计方法。
Bioinformatics. 2013 Nov 15;29(22):2877-83. doi: 10.1093/bioinformatics/btt480. Epub 2013 Aug 19.
6
A colorectal cancer classification system that associates cellular phenotype and responses to therapy.一种与细胞表型和治疗反应相关的结直肠癌分类系统。
Nat Med. 2013 May;19(5):619-25. doi: 10.1038/nm.3175. Epub 2013 Apr 14.
7
Batch effect removal methods for microarray gene expression data integration: a survey.批量效应去除方法在微阵列基因表达数据整合中的应用:综述。
Brief Bioinform. 2013 Jul;14(4):469-90. doi: 10.1093/bib/bbs037. Epub 2012 Jul 31.
8
The sva package for removing batch effects and other unwanted variation in high-throughput experiments.sva 包用于去除高通量实验中的批次效应和其他不需要的变异。
Bioinformatics. 2012 Mar 15;28(6):882-3. doi: 10.1093/bioinformatics/bts034. Epub 2012 Jan 17.
9
A cross-species analysis of a mouse model of breast cancer-specific osteolysis and human bone metastases using gene expression profiling.利用基因表达谱对乳腺癌特异性溶骨性小鼠模型和人骨转移进行种间分析。
BMC Cancer. 2011 Jul 20;11:304. doi: 10.1186/1471-2407-11-304.
10
Molecular profiles and clinical outcome of stage UICC II colon cancer patients.UICC II 期结肠癌患者的分子谱和临床结局。
Int J Colorectal Dis. 2011 Jul;26(7):847-58. doi: 10.1007/s00384-011-1176-x. Epub 2011 Apr 5.