Suppr超能文献

通过置换替代变量分析进行基因组批次校正以保留生物异质性。

Preserving biological heterogeneity with a permuted surrogate variable analysis for genomics batch correction.

作者信息

Parker Hilary S, Leek Jeffrey T, Favorov Alexander V, Considine Michael, Xia Xiaoxin, Chavan Sameer, Chung Christine H, Fertig Elana J

机构信息

Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD 21205, USA, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119333, Russia, Research Institute for Genetics and Selection of Industrial Microorganisms "GosNIIGenetika", Moscow 117545, Russia, Department of Statistics and Biostatistics, Rutgers University, NJ 08854, USA and Division of Allergy & Clinical Immunology, Department of Medicine, Johns Hopkins University, Baltimore, MD 21224, USA.

Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD 21205, USA, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119333, Russia, Research Institute for Genetics and Selection of Industrial Microorganisms "GosNIIGenetika", Moscow 117545, Russia, Department of Statistics and Biostatistics, Rutgers University, NJ 08854, USA and Division of Allergy & Clinical Immunology, Department of Medicine, Johns Hopkins University, Baltimore, MD 21224, USA Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD 21205, USA, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119333, Russia, Research Institute for Genetics and Selection of Industrial Microorganisms "GosNIIGenetika", Moscow 117545, Russia, Department of Statistics and Biostatistics, Rutgers University, NJ 08854, USA and Division of Allergy & Clinical Immunology, Department of Medicine, Johns Hopkins University, Baltimore, MD 21224, USA Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD 21205, USA, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119333, Russia, Research Institute for Genetics and Selection of Industrial Microorganisms "GosNIIGenetika", Moscow 117545, Russia, Department of Statistics and Biostatistics, Rutgers University, NJ 08854, USA and Division of Allergy & Clinical Immunology, Department of Medicine, Johns Hopkins University, Baltimore, MD 21224, USA.

出版信息

Bioinformatics. 2014 Oct;30(19):2757-63. doi: 10.1093/bioinformatics/btu375. Epub 2014 Jun 6.

Abstract

MOTIVATION

Sample source, procurement process and other technical variations introduce batch effects into genomics data. Algorithms to remove these artifacts enhance differences between known biological covariates, but also carry potential concern of removing intragroup biological heterogeneity and thus any personalized genomic signatures. As a result, accurate identification of novel subtypes from batch-corrected genomics data is challenging using standard algorithms designed to remove batch effects for class comparison analyses. Nor can batch effects be corrected reliably in future applications of genomics-based clinical tests, in which the biological groups are by definition unknown a priori.

RESULTS

Therefore, we assess the extent to which various batch correction algorithms remove true biological heterogeneity. We also introduce an algorithm, permuted-SVA (pSVA), using a new statistical model that is blind to biological covariates to correct for technical artifacts while retaining biological heterogeneity in genomic data. This algorithm facilitated accurate subtype identification in head and neck cancer from gene expression data in both formalin-fixed and frozen samples. When applied to predict Human Papillomavirus (HPV) status, pSVA improved cross-study validation even if the sample batches were highly confounded with HPV status in the training set.

AVAILABILITY AND IMPLEMENTATION

All analyses were performed using R version 2.15.0. The code and data used to generate the results of this manuscript is available from https://sourceforge.net/projects/psva.

摘要

动机

样本来源、获取过程及其他技术差异会给基因组数据引入批次效应。用于去除这些伪迹的算法虽能增强已知生物学协变量之间的差异,但也存在去除组内生物学异质性以及任何个性化基因组特征的潜在问题。因此,使用旨在去除批次效应以进行类别比较分析的标准算法,从经批次校正的基因组数据中准确识别新亚型具有挑战性。在基于基因组学的临床试验的未来应用中,批次效应也无法可靠校正,因为在这些应用中,生物学组在定义上先验未知。

结果

因此,我们评估了各种批次校正算法去除真正生物学异质性的程度。我们还引入了一种算法,即置换SVA(pSVA),它使用一种对生物学协变量不敏感的新统计模型来校正技术伪迹,同时保留基因组数据中的生物学异质性。该算法有助于从福尔马林固定和冷冻样本的基因表达数据中准确识别头颈癌的亚型。当应用于预测人乳头瘤病毒(HPV)状态时,即使训练集中的样本批次与HPV状态高度混淆,pSVA也能改善跨研究验证。

可用性和实现方式

所有分析均使用R版本2.15.0进行。生成本文结果所用的代码和数据可从https://sourceforge.net/projects/psva获取。

相似文献

2
Blind estimation and correction of microarray batch effect.盲估计和校正微阵列批次效应。
PLoS One. 2020 Apr 9;15(4):e0231446. doi: 10.1371/journal.pone.0231446. eCollection 2020.
10
The practical effect of batch on genomic prediction.批次对基因组预测的实际影响。
Stat Appl Genet Mol Biol. 2012;11(3):Article 10. doi: 10.1515/1544-6115.1766.

引用本文的文献

本文引用的文献

5
The practical effect of batch on genomic prediction.批次对基因组预测的实际影响。
Stat Appl Genet Mol Biol. 2012;11(3):Article 10. doi: 10.1515/1544-6115.1766.
8

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验