Suppr超能文献

基因长度校正后的均方根值(GeTMM)处理 RNA-seq 数据在样本间分析中表现相似,同时改善了样本内比较。

Gene length corrected trimmed mean of M-values (GeTMM) processing of RNA-seq data performs similarly in intersample analyses while improving intrasample comparisons.

机构信息

Department of Medical Oncology, Erasmus MC Cancer Institute, Erasmus MC University Medical Center, 3015 CE, Rotterdam, The Netherlands.

Department of Surgery, Erasmus MC University Medical Center, 3015 CE, Rotterdam, The Netherlands.

出版信息

BMC Bioinformatics. 2018 Jun 22;19(1):236. doi: 10.1186/s12859-018-2246-7.

Abstract

BACKGROUND

Current normalization methods for RNA-sequencing data allow either for intersample comparison to identify differentially expressed (DE) genes or for intrasample comparison for the discovery and validation of gene signatures. Most studies on optimization of normalization methods typically use simulated data to validate methodologies. We describe a new method, GeTMM, which allows for both inter- and intrasample analyses with the same normalized data set. We used actual (i.e. not simulated) RNA-seq data from 263 colon cancers (no biological replicates) and used the same read count data to compare GeTMM with the most commonly used normalization methods (i.e. TMM (used by edgeR), RLE (used by DESeq2) and TPM) with respect to distributions, effect of RNA quality, subtype-classification, recurrence score, recall of DE genes and correlation to RT-qPCR data.

RESULTS

We observed a clear benefit for GeTMM and TPM with regard to intrasample comparison while GeTMM performed similar to TMM and RLE normalized data in intersample comparisons. Regarding DE genes, recall was found comparable among the normalization methods, while GeTMM showed the lowest number of false-positive DE genes. Remarkably, we observed limited detrimental effects in samples with low RNA quality.

CONCLUSIONS

We show that GeTMM outperforms established methods with regard to intrasample comparison while performing equivalent with regard to intersample normalization using the same normalized data. These combined properties enhance the general usefulness of RNA-seq but also the comparability to the many array-based gene expression data in the public domain.

摘要

背景

目前用于 RNA-seq 数据的归一化方法允许进行样本间比较以鉴定差异表达(DE)基因,或者进行样本内比较以发现和验证基因特征。大多数关于归一化方法优化的研究通常使用模拟数据来验证方法。我们描述了一种新的方法 GeTMM,它允许使用相同的归一化数据集进行样本内和样本间分析。我们使用了 263 个结肠癌的实际(即非模拟)RNA-seq 数据(没有生物学重复),并使用相同的读取计数数据将 GeTMM 与最常用的归一化方法(即 edgeR 中使用的 TMM、DESeq2 中使用的 RLE 和 TPM)进行比较,比较内容包括分布、RNA 质量的影响、亚型分类、复发评分、DE 基因的召回率以及与 RT-qPCR 数据的相关性。

结果

我们观察到 GeTMM 和 TPM 在样本内比较方面具有明显的优势,而 GeTMM 在样本间比较方面的性能与 TMM 和 RLE 归一化数据相似。关于 DE 基因,我们发现归一化方法之间的召回率相当,而 GeTMM 显示的假阳性 DE 基因数量最少。值得注意的是,我们观察到在 RNA 质量低的样本中,效果受到的限制较小。

结论

我们表明,GeTMM 在样本内比较方面优于已建立的方法,而在使用相同归一化数据进行样本间归一化方面表现相当。这些综合特性增强了 RNA-seq 的普遍适用性,同时也增强了与公共领域中许多基于阵列的基因表达数据的可比性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ab2/6013957/e69a6caa837e/12859_2018_2246_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验