Suppr超能文献

整合来自不同来源的癌症和正常 RNA 测序数据。

Unifying cancer and normal RNA sequencing data from different sources.

机构信息

Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, New York 10065, USA.

Marie-Josée and Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, New York 10065, USA.

出版信息

Sci Data. 2018 Apr 17;5:180061. doi: 10.1038/sdata.2018.61.

Abstract

Driven by the recent advances of next generation sequencing (NGS) technologies and an urgent need to decode complex human diseases, a multitude of large-scale studies were conducted recently that have resulted in an unprecedented volume of whole transcriptome sequencing (RNA-seq) data, such as the Genotype Tissue Expression project (GTEx) and The Cancer Genome Atlas (TCGA). While these data offer new opportunities to identify the mechanisms underlying disease, the comparison of data from different sources remains challenging, due to differences in sample and data processing. Here, we developed a pipeline that processes and unifies RNA-seq data from different studies, which includes uniform realignment, gene expression quantification, and batch effect removal. We find that uniform alignment and quantification is not sufficient when combining RNA-seq data from different sources and that the removal of other batch effects is essential to facilitate data comparison. We have processed data from GTEx and TCGA and successfully corrected for study-specific biases, enabling comparative analysis between TCGA and GTEx. The normalized datasets are available for download on figshare.

摘要

受新一代测序 (NGS) 技术的最新进展和解读复杂人类疾病的迫切需求的推动,最近开展了大量的大规模研究,产生了前所未有的全转录组测序 (RNA-seq) 数据,例如基因型组织表达项目 (GTEx) 和癌症基因组图谱 (TCGA)。尽管这些数据为识别疾病的机制提供了新的机会,但由于样本和数据处理的差异,来自不同来源的数据的比较仍然具有挑战性。在这里,我们开发了一个处理和统一来自不同研究的 RNA-seq 数据的管道,包括统一重对齐、基因表达定量和批次效应去除。我们发现,当组合来自不同来源的 RNA-seq 数据时,统一对齐和定量是不够的,去除其他批次效应对于促进数据比较是必不可少的。我们已经处理了 GTEx 和 TCGA 的数据,并成功纠正了研究特异性偏差,从而能够在 TCGA 和 GTEx 之间进行比较分析。归一化数据集可在 figshare 上下载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e41d/5903355/384d2268c6b5/sdata201861-f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验