Research and Development Informatics, Amgen Inc., One Amgen Center Drive, Thousand Oaks, CA 91320, USA.
Bioinformatics. 2011 Jul 15;27(14):1922-8. doi: 10.1093/bioinformatics/btr310. Epub 2011 May 18.
Next generation sequencing technology generates high-throughput data, which allows us to detect fusion genes at both transcript and genomic levels. To detect fusion genes, the current bioinformatics tools heavily rely on paired-end approaches and overlook the importance of reads that span fusion junctions. Thus there is a need to develop an efficient aligner to detect fusion events by accurate mapping of these junction-spanning single reads, particularly when the read gets longer with the improvement in sequencing technology.
We present a novel method, FusionMap, which aligns fusion reads directly to the genome without prior knowledge of potential fusion regions. FusionMap can detect fusion events in both single- and paired-end datasets from either RNA-Seq or gDNA-Seq studies and characterize fusion junctions at base-pair resolution. We showed that FusionMap achieved high sensitivity and specificity in fusion detection on two simulated RNA-Seq datasets, which contained 75 nt paired-end reads. FusionMap achieved substantially higher sensitivity and specificity than the paired-end approach when the inner distance between read pairs was small. Using FusionMap to characterize fusion genes in K562 chronic myeloid leukemia cell line, we further demonstrated its accuracy in fusion detection in both single-end RNA-Seq and gDNA-Seq datasets. These combined results show that FusionMap provides an accurate and systematic solution to detecting fusion events through junction-spanning reads.
FusionMap includes reference indexing, read filtering, fusion alignment and reporting in one package. The software is free for noncommercial use at (http://www.omicsoft.com/fusionmap).
下一代测序技术可产生高通量数据,使我们能够在转录和基因组水平上检测融合基因。为了检测融合基因,当前的生物信息学工具严重依赖于配对末端方法,而忽略了跨越融合接头的reads 的重要性。因此,需要开发一种有效的比对算法,通过准确映射这些跨越接头的单端 reads 来检测融合事件,特别是当测序技术的改进导致 read 变长时。
我们提出了一种新方法 FusionMap,它无需先验融合区域知识即可直接将融合 reads 映射到基因组上。FusionMap 可以从 RNA-Seq 或 gDNA-Seq 研究的单端和配对末端数据集中检测融合事件,并以碱基分辨率表征融合接头。我们表明,在包含 75nt 配对末端 reads 的两个模拟 RNA-Seq 数据集上,FusionMap 在融合检测方面具有高灵敏度和特异性。当 read 对之间的内距离较小时,FusionMap 比配对末端方法具有更高的灵敏度和特异性。使用 FusionMap 对 K562 慢性髓系白血病细胞系中的融合基因进行特征描述,我们进一步证明了其在单端 RNA-Seq 和 gDNA-Seq 数据集中融合检测的准确性。这些综合结果表明,FusionMap 通过跨越接头的 reads 提供了一种准确和系统的融合事件检测解决方案。
FusionMap 在一个软件包中包含参考索引、读过滤、融合比对和报告。该软件免费供非商业用途使用(http://www.omicsoft.com/fusionmap)。