Tan Yukun, Mohanty Vakul, Liang Shaoheng, Dou Jinzhuang, Ma Jun, Kim Kun Hee, Bonder Marc Jan, Shi Xinghua, Lee Charles, Chong Zechen, Chen Ken
Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas, 77030, USA.
Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, 69120, Germany.
J Bioinform Syst Biol. 2023;6(2):74-81. doi: 10.26502/jbsb.5107050. Epub 2023 Apr 4.
We present novoRNABreak, a unified framework for cancer specific novel splice junction and fusion transcript detection in RNA-seq data obtained from human cancer samples. novoRNABreak is based on a local assembly model, which offers a tradeoff between the alignment-based and de novo whole transcriptome assembly (WTA) methods. This approach is accurate and sensitive in assembling novel junctions that are difficult to directly align or have multiple alignments. Additionally, it is more efficient due to the strategy that focuses on junctions rather than full length transcripts. The performance of novoRNABreak is demonstrated by a comprehensive set of experiments using synthetic data generated based on genome reference, as well as real RNA-seq data from breast cancer and prostate cancer samples. The results show that our tool has a better performance by fully utilizing unmapped reads and precisely identifying the junctions where short reads or small exons have multiple alignments. novoRNABreak is a fully-fledged program available on GitHub (https://github.com/KChen-lab/novoRNABreak).
我们展示了novoRNABreak,这是一个用于在从人类癌症样本获得的RNA测序数据中检测癌症特异性新型剪接连接和融合转录本的统一框架。novoRNABreak基于局部组装模型,该模型在基于比对的方法和从头全转录组建模(WTA)方法之间进行了权衡。这种方法在组装难以直接比对或有多个比对的新型连接时准确且灵敏。此外,由于其专注于连接而非全长转录本的策略,它更高效。通过使用基于基因组参考生成的合成数据以及来自乳腺癌和前列腺癌样本的真实RNA测序数据进行的一系列综合实验,证明了novoRNABreak的性能。结果表明,我们的工具通过充分利用未比对读数并精确识别短读数或小外显子有多个比对的连接,具有更好的性能。novoRNABreak是一个功能齐全的程序,可在GitHub(https://github.com/KChen-lab/novoRNABreak)上获取。