Jin Ying, Tam Oliver H, Paniagua Eric, Hammell Molly
Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA.
Bioinformatics. 2015 Nov 15;31(22):3593-9. doi: 10.1093/bioinformatics/btv422. Epub 2015 Jul 23.
Most RNA-seq data analysis software packages are not designed to handle the complexities involved in properly apportioning short sequencing reads to highly repetitive regions of the genome. These regions are often occupied by transposable elements (TEs), which make up between 20 and 80% of eukaryotic genomes. They can contribute a substantial portion of transcriptomic and genomic sequence reads, but are typically ignored in most analyses.
Here, we present a method and software package for including both gene- and TE-associated ambiguously mapped reads in differential expression analysis. Our method shows improved recovery of TE transcripts over other published expression analysis methods, in both synthetic data and qPCR/NanoString-validated published datasets.
The source code, associated GTF files for TE annotation, and testing data are freely available at http://hammelllab.labsites.cshl.edu/software.
Supplementary data are available at Bioinformatics online.
大多数RNA测序数据分析软件包并非设计用于处理将短测序读段正确分配到基因组高度重复区域所涉及的复杂性。这些区域通常被转座元件(TE)占据,转座元件占真核生物基因组的20%至80%。它们可以贡献转录组和基因组序列读段的很大一部分,但在大多数分析中通常被忽略。
在此,我们提出了一种方法和软件包,用于在差异表达分析中纳入与基因和TE相关的模糊映射读段。在合成数据以及qPCR/纳米串验证的已发表数据集中,我们的方法在TE转录本的恢复方面比其他已发表的表达分析方法表现更好。
源代码、与TE注释相关的GTF文件以及测试数据可在http://hammelllab.labsites.cshl.edu/software免费获取。
补充数据可在《生物信息学》在线获取。