Suppr超能文献

重复读数和低复杂度区域对RNA测序和染色质免疫沉淀测序数据的有害影响。

Detrimental effects of duplicate reads and low complexity regions on RNA- and ChIP-seq data.

作者信息

Dozmorov Mikhail G, Adrianto Indra, Giles Cory B, Glass Edmund, Glenn Stuart B, Montgomery Courtney, Sivils Kathy L, Olson Lorin E, Iwayama Tomoaki, Freeman Willard M, Lessard Christopher J, Wren Jonathan D

出版信息

BMC Bioinformatics. 2015;16 Suppl 13(Suppl 13):S10. doi: 10.1186/1471-2105-16-S13-S10. Epub 2015 Sep 25.

Abstract

BACKGROUND

Adapter trimming and removal of duplicate reads are common practices in next-generation sequencing pipelines. Sequencing reads ambiguously mapped to repetitive and low complexity regions can also be problematic for accurate assessment of the biological signal, yet their impact on sequencing data has not received much attention. We investigate how trimming the adapters, removing duplicates, and filtering out reads overlapping low complexity regions influence the significance of biological signal in RNA- and ChIP-seq experiments.

METHODS

We assessed the effect of data processing steps on the alignment statistics and the functional enrichment analysis results of RNA- and ChIP-seq data. We compared differentially processed RNA-seq data with matching microarray data on the same patient samples to determine whether changes in pre-processing improved correlation between the two. We have developed a simple tool to remove low complexity regions, RepeatSoaker, available at https://github.com/mdozmorov/RepeatSoaker, and tested its effect on the alignment statistics and the results of the enrichment analyses.

RESULTS

Both adapter trimming and duplicate removal moderately improved the strength of biological signals in RNA-seq and ChIP-seq data. Aggressive filtering of reads overlapping with low complexity regions, as defined by RepeatMasker, further improved the strength of biological signals, and the correlation between RNA-seq and microarray gene expression data.

CONCLUSIONS

Adapter trimming and duplicates removal, coupled with filtering out reads overlapping low complexity regions, is shown to increase the quality and reliability of detecting biological signals in RNA-seq and ChIP-seq data.

摘要

背景

接头修剪和去除重复读取是新一代测序流程中的常见操作。测序读取模糊映射到重复和低复杂度区域对于生物信号的准确评估也可能存在问题,但其对测序数据的影响尚未受到太多关注。我们研究了接头修剪、去除重复以及过滤掉与低复杂度区域重叠的读取如何影响RNA测序和染色质免疫沉淀测序(ChIP-seq)实验中生物信号的显著性。

方法

我们评估了数据处理步骤对RNA测序和ChIP-seq数据的比对统计和功能富集分析结果的影响。我们将经过不同处理的RNA测序数据与同一患者样本上匹配的微阵列数据进行比较,以确定预处理的变化是否改善了两者之间的相关性。我们开发了一个简单的工具来去除低复杂度区域,即RepeatSoaker,可在https://github.com/mdozmorov/RepeatSoaker获取,并测试了其对比对统计和富集分析结果的影响。

结果

接头修剪和去除重复都适度提高了RNA测序和ChIP-seq数据中生物信号的强度。如RepeatMasker所定义的那样,对与低复杂度区域重叠的读取进行积极过滤进一步提高了生物信号的强度以及RNA测序与微阵列基因表达数据之间的相关性。

结论

接头修剪、去除重复以及过滤掉与低复杂度区域重叠的读取可提高RNA测序和ChIP-seq数据中生物信号检测的质量和可靠性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faf1/4597324/cdd2ee422c26/1471-2105-16-S13-S10-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验