Bioinformatics HUSAR, Genomics Proteomics Core Facility, German Cancer Research Center, Im Neuenheimer Feld 580, 69120 Heidelberg, Germany.
Bioinformatics. 2013 May 1;29(9):1141-8. doi: 10.1093/bioinformatics/btt101. Epub 2013 Feb 28.
Alternative splicing is central for cellular processes and substantially increases transcriptome and proteome diversity. Aberrant splicing events often have pathological consequences and are associated with various diseases and cancer types. The emergence of next-generation RNA sequencing (RNA-seq) provides an exciting new technology to analyse alternative splicing on a large scale. However, algorithms that enable the analysis of alternative splicing from short-read sequencing are not fully established yet and there are still no standard solutions available for a variety of data analysis tasks.
We present a new method and software to predict genes that are differentially spliced between two different conditions using RNA-seq data. Our method uses geometric angles between the high dimensional vectors of exon read counts. With this, differential splicing can be detected even if the splicing events are composed of higher complexity and involve previously unknown splicing patterns. We applied our approach to two case studies including neuroblastoma tumour data with favourable and unfavourable clinical courses. We show the validity of our predictions as well as the applicability of our method in the context of patient clustering. We verified our predictions by several methods including simulated experiments and complementary in silico analyses. We found a significant number of exons with specific regulatory splicing factor motifs for predicted genes and a substantial number of publications linking those genes to alternative splicing. Furthermore, we could successfully exploit splicing information to cluster tissues and patients. Finally, we found additional evidence of splicing diversity for many predicted genes in normalized read coverage plots and in reads that span exon-exon junctions.
SplicingCompass is licensed under the GNU GPL and freely available as a package in the statistical language R at http://www.ichip.de/software/SplicingCompass.html
可变剪接是细胞过程的核心,大大增加了转录组和蛋白质组的多样性。异常剪接事件通常具有病理后果,并与各种疾病和癌症类型有关。新一代 RNA 测序(RNA-seq)的出现为大规模分析可变剪接提供了令人兴奋的新技术。然而,能够从短读测序中分析可变剪接的算法尚未完全建立,并且仍然没有各种数据分析任务的标准解决方案。
我们提出了一种新的方法和软件,用于使用 RNA-seq 数据预测两个不同条件之间差异剪接的基因。我们的方法使用外显子读数的高维向量之间的几何角度。通过这种方法,即使剪接事件由更高的复杂性组成并且涉及以前未知的剪接模式,也可以检测到差异剪接。我们将我们的方法应用于两个案例研究,包括具有有利和不利临床过程的神经母细胞瘤肿瘤数据。我们展示了我们预测的有效性以及我们方法在患者聚类背景下的适用性。我们通过包括模拟实验和互补的计算分析在内的几种方法验证了我们的预测。我们发现了许多具有预测基因特异性调节剪接因子基序的外显子,并且有大量出版物将这些基因与可变剪接联系起来。此外,我们可以成功地利用剪接信息对组织和患者进行聚类。最后,我们在归一化读数覆盖图和跨越外显子-外显子接头的读数中发现了许多预测基因的剪接多样性的额外证据。
SplicingCompass 根据 GNU GPL 获得许可,并作为统计语言 R 中的软件包在 http://www.ichip.de/software/SplicingCompass.html 上免费提供。