• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过置换替代变量分析进行基因组批次校正以保留生物异质性。

Preserving biological heterogeneity with a permuted surrogate variable analysis for genomics batch correction.

作者信息

Parker Hilary S, Leek Jeffrey T, Favorov Alexander V, Considine Michael, Xia Xiaoxin, Chavan Sameer, Chung Christine H, Fertig Elana J

机构信息

Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD 21205, USA, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119333, Russia, Research Institute for Genetics and Selection of Industrial Microorganisms "GosNIIGenetika", Moscow 117545, Russia, Department of Statistics and Biostatistics, Rutgers University, NJ 08854, USA and Division of Allergy & Clinical Immunology, Department of Medicine, Johns Hopkins University, Baltimore, MD 21224, USA.

Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD 21205, USA, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119333, Russia, Research Institute for Genetics and Selection of Industrial Microorganisms "GosNIIGenetika", Moscow 117545, Russia, Department of Statistics and Biostatistics, Rutgers University, NJ 08854, USA and Division of Allergy & Clinical Immunology, Department of Medicine, Johns Hopkins University, Baltimore, MD 21224, USA Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD 21205, USA, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119333, Russia, Research Institute for Genetics and Selection of Industrial Microorganisms "GosNIIGenetika", Moscow 117545, Russia, Department of Statistics and Biostatistics, Rutgers University, NJ 08854, USA and Division of Allergy & Clinical Immunology, Department of Medicine, Johns Hopkins University, Baltimore, MD 21224, USA Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD 21205, USA, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119333, Russia, Research Institute for Genetics and Selection of Industrial Microorganisms "GosNIIGenetika", Moscow 117545, Russia, Department of Statistics and Biostatistics, Rutgers University, NJ 08854, USA and Division of Allergy & Clinical Immunology, Department of Medicine, Johns Hopkins University, Baltimore, MD 21224, USA.

出版信息

Bioinformatics. 2014 Oct;30(19):2757-63. doi: 10.1093/bioinformatics/btu375. Epub 2014 Jun 6.

DOI:10.1093/bioinformatics/btu375
PMID:24907368
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4173013/
Abstract

MOTIVATION

Sample source, procurement process and other technical variations introduce batch effects into genomics data. Algorithms to remove these artifacts enhance differences between known biological covariates, but also carry potential concern of removing intragroup biological heterogeneity and thus any personalized genomic signatures. As a result, accurate identification of novel subtypes from batch-corrected genomics data is challenging using standard algorithms designed to remove batch effects for class comparison analyses. Nor can batch effects be corrected reliably in future applications of genomics-based clinical tests, in which the biological groups are by definition unknown a priori.

RESULTS

Therefore, we assess the extent to which various batch correction algorithms remove true biological heterogeneity. We also introduce an algorithm, permuted-SVA (pSVA), using a new statistical model that is blind to biological covariates to correct for technical artifacts while retaining biological heterogeneity in genomic data. This algorithm facilitated accurate subtype identification in head and neck cancer from gene expression data in both formalin-fixed and frozen samples. When applied to predict Human Papillomavirus (HPV) status, pSVA improved cross-study validation even if the sample batches were highly confounded with HPV status in the training set.

AVAILABILITY AND IMPLEMENTATION

All analyses were performed using R version 2.15.0. The code and data used to generate the results of this manuscript is available from https://sourceforge.net/projects/psva.

摘要

动机

样本来源、获取过程及其他技术差异会给基因组数据引入批次效应。用于去除这些伪迹的算法虽能增强已知生物学协变量之间的差异,但也存在去除组内生物学异质性以及任何个性化基因组特征的潜在问题。因此,使用旨在去除批次效应以进行类别比较分析的标准算法,从经批次校正的基因组数据中准确识别新亚型具有挑战性。在基于基因组学的临床试验的未来应用中,批次效应也无法可靠校正,因为在这些应用中,生物学组在定义上先验未知。

结果

因此,我们评估了各种批次校正算法去除真正生物学异质性的程度。我们还引入了一种算法,即置换SVA(pSVA),它使用一种对生物学协变量不敏感的新统计模型来校正技术伪迹,同时保留基因组数据中的生物学异质性。该算法有助于从福尔马林固定和冷冻样本的基因表达数据中准确识别头颈癌的亚型。当应用于预测人乳头瘤病毒(HPV)状态时,即使训练集中的样本批次与HPV状态高度混淆,pSVA也能改善跨研究验证。

可用性和实现方式

所有分析均使用R版本2.15.0进行。生成本文结果所用的代码和数据可从https://sourceforge.net/projects/psva获取。

相似文献

1
Preserving biological heterogeneity with a permuted surrogate variable analysis for genomics batch correction.通过置换替代变量分析进行基因组批次校正以保留生物异质性。
Bioinformatics. 2014 Oct;30(19):2757-63. doi: 10.1093/bioinformatics/btu375. Epub 2014 Jun 6.
2
Blind estimation and correction of microarray batch effect.盲估计和校正微阵列批次效应。
PLoS One. 2020 Apr 9;15(4):e0231446. doi: 10.1371/journal.pone.0231446. eCollection 2020.
3
Practical impacts of genomic data "cleaning" on biological discovery using surrogate variable analysis.基因组数据“清理”对使用替代变量分析的生物学发现的实际影响。
BMC Bioinformatics. 2015 Nov 6;16:372. doi: 10.1186/s12859-015-0808-5.
4
svaseq: removing batch effects and other unwanted noise from sequencing data.svaseq:去除测序数据中的批次效应和其他不必要的噪声。
Nucleic Acids Res. 2014 Dec 1;42(21):e161. doi: 10.1093/nar/gku864. Epub 2014 Oct 7.
5
Removing batch effects for prediction problems with frozen surrogate variable analysis.使用固定替代变量分析消除预测问题中的批次效应。
PeerJ. 2014 Sep 23;2:e561. doi: 10.7717/peerj.561. eCollection 2014.
6
OSAT: a tool for sample-to-batch allocations in genomics experiments.OSAT:基因组学实验中样本到批次分配的工具。
BMC Genomics. 2012 Dec 10;13:689. doi: 10.1186/1471-2164-13-689.
7
BatchQC: interactive software for evaluating sample and batch effects in genomic data.BatchQC:用于评估基因组数据中样本和批次效应的交互式软件。
Bioinformatics. 2016 Dec 15;32(24):3836-3838. doi: 10.1093/bioinformatics/btw538. Epub 2016 Aug 18.
8
Ridle for sparse regression with mandatory covariates with application to the genetic assessment of histologic grades of breast cancer.带有强制协变量的稀疏回归难题及其在乳腺癌组织学分级基因评估中的应用
BMC Med Res Methodol. 2017 Jan 25;17(1):12. doi: 10.1186/s12874-017-0291-y.
9
Identification of human papillomavirus status specific biomarker in head and neck cancer.头颈部癌中人乳头瘤病毒状态特异性生物标志物的鉴定
Head Neck. 2015 Sep;37(9):1310-8. doi: 10.1002/hed.23751. Epub 2014 Jul 19.
10
The practical effect of batch on genomic prediction.批次对基因组预测的实际影响。
Stat Appl Genet Mol Biol. 2012;11(3):Article 10. doi: 10.1515/1544-6115.1766.

引用本文的文献

1
Mechanism of action of against human papillomavirus-positive head and neck squamous cell carcinoma.[具体药物名称]对人乳头瘤病毒阳性头颈部鳞状细胞癌的作用机制
Oncol Lett. 2025 Aug 11;30(4):475. doi: 10.3892/ol.2025.15221. eCollection 2025 Oct.
2
Exploring the comorbidity mechanisms of ITGB2 in rheumatoid arthritis and membranous nephropathy through integrated bioinformatics analysis.通过综合生物信息学分析探索整合素β2(ITGB2)在类风湿关节炎和膜性肾病中的共病机制。
Ren Fail. 2025 Dec;47(1):2536730. doi: 10.1080/0886022X.2025.2536730. Epub 2025 Jul 23.
3
Integrated multi-omics and machine learning reveals immune-metabolic signatures in osteoarthritis: from bulk RNA-seq to single-cell resolution.综合多组学和机器学习揭示骨关节炎中的免疫代谢特征:从批量RNA测序到单细胞分辨率
Front Immunol. 2025 Jun 16;16:1599930. doi: 10.3389/fimmu.2025.1599930. eCollection 2025.
4
Asb10 accelerates pathological cardiac remodeling by stabilizing HSP70.Asb10通过稳定热休克蛋白70(HSP70)来加速病理性心脏重塑。
Cell Death Dis. 2025 May 22;16(1):409. doi: 10.1038/s41419-025-07735-5.
5
Identification and validation of HOXC6 as a diagnostic biomarker for Ewing sarcoma: insights from machine learning algorithms and experiments.鉴定和验证HOXC6作为尤因肉瘤的诊断生物标志物:来自机器学习算法和实验的见解
Front Immunol. 2025 Apr 4;16:1449355. doi: 10.3389/fimmu.2025.1449355. eCollection 2025.
6
Accurate identification of medulloblastoma subtypes from diverse data sources with severe batch effects by RaMBat.通过RaMBat从具有严重批次效应的各种数据源中准确识别髓母细胞瘤亚型。
bioRxiv. 2025 May 5:2025.02.24.640010. doi: 10.1101/2025.02.24.640010.
7
Elevation of ANXA1 associated with potential protective mechanism against ferroptosis and immune cell infiltration in age-related macular degeneration.膜联蛋白A1的升高与年龄相关性黄斑变性中针对铁死亡和免疫细胞浸润的潜在保护机制有关。
Eur J Med Res. 2024 Dec 23;29(1):615. doi: 10.1186/s40001-024-02163-1.
8
The shared biomarkers and immune landscape in psoriatic arthritis and rheumatoid arthritis: Findings based on bioinformatics, machine learning and single-cell analysis.银屑病关节炎和类风湿关节炎的共享生物标志物和免疫景观:基于生物信息学、机器学习和单细胞分析的研究结果。
PLoS One. 2024 Nov 7;19(11):e0313344. doi: 10.1371/journal.pone.0313344. eCollection 2024.
9
Neurobiology of attention-deficit hyperactivity disorder: historical challenges and emerging frontiers.注意缺陷多动障碍的神经生物学:历史挑战与新兴前沿。
Nat Rev Neurosci. 2024 Dec;25(12):759-775. doi: 10.1038/s41583-024-00869-z. Epub 2024 Oct 24.
10
Thinking points for effective batch correction on biomedical data.生物医学数据有效批量校正的思考要点。
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae515.

本文引用的文献

1
Removing batch effects for prediction problems with frozen surrogate variable analysis.使用固定替代变量分析消除预测问题中的批次效应。
PeerJ. 2014 Sep 23;2:e561. doi: 10.7717/peerj.561. eCollection 2014.
2
Emerging landscape of oncogenic signatures across human cancers.人类癌症中致癌特征的新态势。
Nat Genet. 2013 Oct;45(10):1127-33. doi: 10.1038/ng.2762.
3
Molecular subtypes in head and neck cancer exhibit distinct patterns of chromosomal gain and loss of canonical cancer genes.头颈部癌症的分子亚型表现出典型癌症基因的染色体获得和丢失的不同模式。
PLoS One. 2013;8(2):e56823. doi: 10.1371/journal.pone.0056823. Epub 2013 Feb 22.
4
Quality assessment and data handling methods for Affymetrix Gene 1.0 ST arrays with variable RNA integrity.针对 RNA 完整性存在差异的 Affymetrix Gene 1.0 ST 阵列的质量评估和数据处理方法。
BMC Genomics. 2013 Jan 16;14:14. doi: 10.1186/1471-2164-14-14.
5
The practical effect of batch on genomic prediction.批次对基因组预测的实际影响。
Stat Appl Genet Mol Biol. 2012;11(3):Article 10. doi: 10.1515/1544-6115.1766.
6
The sva package for removing batch effects and other unwanted variation in high-throughput experiments.sva 包用于去除高通量实验中的批次效应和其他不需要的变异。
Bioinformatics. 2012 Mar 15;28(6):882-3. doi: 10.1093/bioinformatics/bts034. Epub 2012 Jan 17.
7
Batch effect correction for genome-wide methylation data with Illumina Infinium platform.基于 Illumina Infinium 平台的全基因组甲基化数据的批次效应校正。
BMC Med Genomics. 2011 Dec 16;4:84. doi: 10.1186/1755-8794-4-84.
8
Using control genes to correct for unwanted variation in microarray data.利用对照基因纠正微阵列数据中的非期望变异。
Biostatistics. 2012 Jul;13(3):539-52. doi: 10.1093/biostatistics/kxr034. Epub 2011 Nov 17.
9
Temporal dynamics and genetic control of transcription in the human prefrontal cortex.人类前额叶皮层转录的时空动态和遗传控制。
Nature. 2011 Oct 26;478(7370):519-23. doi: 10.1038/nature10524.
10
Phase 2 trial of oxaliplatin and pemetrexed as an induction regimen in locally advanced head and neck cancer.奥沙利铂和培美曲塞联合治疗局部晚期头颈部癌的Ⅱ期临床试验。
Cancer. 2012 Feb 15;118(4):1007-13. doi: 10.1002/cncr.26364. Epub 2011 Jul 15.