Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Ihnestr 73, 14195 Berlin, Germany.
Nucleic Acids Res. 2010 Jun;38(10):e112. doi: 10.1093/nar/gkq041. Epub 2010 Feb 11.
Alternative splicing, polyadenylation of pre-messenger RNA molecules and differential promoter usage can produce a variety of transcript isoforms whose respective expression levels are regulated in time and space, thus contributing specific biological functions. However, the repertoire of mammalian alternative transcripts and their regulation are still poorly understood. Second-generation sequencing is now opening unprecedented routes to address the analysis of entire transcriptomes. Here, we developed methods that allow the prediction and quantification of alternative isoforms derived solely from exon expression levels in RNA-Seq data. These are based on an explicit statistical model and enable the prediction of alternative isoforms within or between conditions using any known gene annotation, as well as the relative quantification of known transcript structures. Applying these methods to a human RNA-Seq dataset, we validated a significant fraction of the predictions by RT-PCR. Data further showed that these predictions correlated well with information originating from junction reads. A direct comparison with exon arrays indicated improved performances of RNA-Seq over microarrays in the prediction of skipped exons. Altogether, the set of methods presented here comprehensively addresses multiple aspects of alternative isoform analysis. The software is available as an open-source R-package called Solas at http://cmb.molgen.mpg.de/2ndGenerationSequencing/Solas/.
可变剪接、前信使 RNA 分子的多聚腺苷酸化和差异启动子使用可以产生各种转录本异构体,其各自的表达水平在时间和空间上受到调节,从而发挥特定的生物学功能。然而,哺乳动物可变转录本的范围及其调控仍知之甚少。第二代测序技术现在为分析整个转录组开辟了前所未有的途径。在这里,我们开发了仅基于 RNA-Seq 数据中exon 表达水平来预测和定量可变异构体的方法。这些方法基于一个显式的统计模型,能够使用任何已知的基因注释在条件内或条件之间预测可变异构体,以及对已知转录本结构进行相对定量。将这些方法应用于人类 RNA-Seq 数据集,我们通过 RT-PCR 验证了预测的重要部分。进一步的数据表明,这些预测与来自连接读取的信息很好地相关。与exon 芯片的直接比较表明,在预测外显子跳跃方面,RNA-Seq 优于微阵列。总的来说,这里提出的方法集全面解决了可变异构体分析的多个方面。该软件作为一个名为 Solas 的开源 R 包可在 http://cmb.molgen.mpg.de/2ndGenerationSequencing/Solas/ 获得。