Department of Biostatistics, Epidemiology and Informatics.
Renal Electrolyte and Hypertension Division, Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA.
Bioinformatics. 2018 Jul 15;34(14):2384-2391. doi: 10.1093/bioinformatics/bty097.
Alternative splicing and alternative transcription are a major mechanism for generating transcriptome diversity. Differential alternative splicing and transcription (DAST), which describe different usage of transcript isoforms across different conditions, can complement differential expression in characterizing gene regulation. However, the analysis of DAST is challenging because only a small fraction of RNA-seq reads is informative for isoforms. Several methods have been developed to detect exon-based and gene-based DAST, but they suffer from power loss for genes with many isoforms.
We present PennDiff, a novel statistical method that makes use of information on gene structures and pre-estimated isoform relative abundances, to detect DAST from RNA-seq data. PennDiff has several advantages. First, grouping exons avoids multiple testing for 'exons' originated from the same isoform(s). Second, it utilizes all available reads in exon-inclusion level estimation, which is different from methods that only use junction reads. Third, collapsing isoforms sharing the same alternative exons reduces the impact of isoform expression estimation uncertainty. PennDiff is able to detect DAST at both exon and gene levels, thus offering more flexibility than existing methods. Simulations and analysis of a real RNA-seq dataset indicate that PennDiff has well-controlled type I error rate, and is more powerful than existing methods including DEXSeq, rMATS, Cuffdiff, IUTA and SplicingCompass. As the popularity of RNA-seq continues to grow, we expect PennDiff to be useful for diverse transcriptomics studies.
PennDiff source code and user guide is freely available for download at https://github.com/tigerhu15/PennDiff.
Supplementary data are available at Bioinformatics online.
可变剪接和可变转录是产生转录组多样性的主要机制。描述不同条件下转录本异构体不同使用情况的差异可变剪接和转录(DAST),可以补充差异表达,从而更全面地描述基因调控。然而,DAST 的分析具有挑战性,因为只有一小部分 RNA-seq 读段可用于异构体信息。已经开发了几种方法来检测基于外显子和基于基因的 DAST,但它们对于具有许多异构体的基因会损失部分检测能力。
我们提出了 PennDiff,这是一种新颖的统计方法,它利用基因结构和预先估计的异构体相对丰度的信息,从 RNA-seq 数据中检测 DAST。PennDiff 具有几个优势。首先,将外显子分组可避免对来自同一异构体的“外显子”进行多次测试。其次,它利用外显子包含水平估计中所有可用的读段,这与仅使用连接读段的方法不同。第三,将具有相同可变外显子的异构体进行合并,可减少异构体表达估计不确定性的影响。PennDiff 能够在exon 和 gene 水平上检测 DAST,因此比现有方法(包括 DEXSeq、rMATS、Cuffdiff、IUTA 和 SplicingCompass)更具灵活性。模拟和对真实 RNA-seq 数据集的分析表明,PennDiff 具有良好的控制 I 型错误率,并且比包括 DEXSeq、rMATS、Cuffdiff、IUTA 和 SplicingCompass 在内的现有方法更具检测能力。随着 RNA-seq 的普及度不断增加,我们预计 PennDiff 将对各种转录组学研究有用。
PennDiff 的源代码和用户指南可在 https://github.com/tigerhu15/PennDiff 上免费下载。
补充数据可在 Bioinformatics 在线获取。