Suppr超能文献

在基因水平上对RNA测序读数进行比对和定量时,无需进行读数修剪。

Read trimming is not required for mapping and quantification of RNA-seq reads at the gene level.

作者信息

Liao Yang, Shi Wei

机构信息

Olivia Newton-John Cancer Research Institute, Heidelberg, Victoria 3084, Australia.

出版信息

NAR Genom Bioinform. 2020 Sep 3;2(3):lqaa068. doi: 10.1093/nargab/lqaa068. eCollection 2020 Sep.

Abstract

RNA sequencing (RNA-seq) is currently the standard method for genome-wide expression profiling. RNA-seq reads often need to be mapped to a reference genome before read counts can be produced for genes. Read trimming methods have been developed to assist read mapping by removing adapter sequences and low-sequencing-quality bases. It is however unclear what is the impact of read trimming on the quantification of RNA-seq data, an important task in RNA-seq data analysis. In this study, we used a benchmark RNA-seq dataset and simulation data to assess the impact of read trimming on mapping and quantification of RNA-seq reads. We found that adapter sequences can be effectively removed by read aligner via 'soft-clipping' and that many low-sequencing-quality bases, which would be removed by read trimming tools, were rescued by the aligner. Accuracy of gene expression quantification from using untrimmed reads was found to be comparable to or slightly better than that from using trimmed reads, based on Pearson correlation with reverse transcriptase-polymerase chain reaction data and simulation truth. Total data analysis time was reduced by up to an order of magnitude when read trimming was not performed. Our study suggests that read trimming is a redundant process in the quantification of RNA-seq expression data.

摘要

RNA测序(RNA-seq)是目前全基因组表达谱分析的标准方法。在为基因生成读数计数之前,RNA-seq读数通常需要映射到参考基因组。已经开发了读数修剪方法,通过去除接头序列和低测序质量的碱基来辅助读数映射。然而,尚不清楚读数修剪对RNA-seq数据定量的影响,这是RNA-seq数据分析中的一项重要任务。在本研究中,我们使用了一个基准RNA-seq数据集和模拟数据来评估读数修剪对RNA-seq读数映射和定量的影响。我们发现,接头序列可以通过比对工具通过“软剪切”有效地去除,并且许多会被读数修剪工具去除的低测序质量碱基被比对工具挽救了。基于与逆转录聚合酶链反应数据的Pearson相关性和模拟真值,发现使用未修剪读数进行基因表达定量的准确性与使用修剪读数的准确性相当或略好。当不进行读数修剪时,总数据分析时间最多减少了一个数量级。我们的研究表明,读数修剪在RNA-seq表达数据定量中是一个多余的过程。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验