Department of Biostatistics, University of Iowa, Iowa City, IA 52242, USA.
Nucleic Acids Res. 2012 Apr;40(8):e61. doi: 10.1093/nar/gkr1291. Epub 2012 Jan 20.
Ultra-deep RNA sequencing has become a powerful approach for genome-wide analysis of pre-mRNA alternative splicing. We develop MATS (multivariate analysis of transcript splicing), a bayesian statistical framework for flexible hypothesis testing of differential alternative splicing patterns on RNA-Seq data. MATS uses a multivariate uniform prior to model the between-sample correlation in exon splicing patterns, and a Markov chain Monte Carlo (MCMC) method coupled with a simulation-based adaptive sampling procedure to calculate the P-value and false discovery rate (FDR) of differential alternative splicing. Importantly, the MATS approach is applicable to almost any type of null hypotheses of interest, providing the flexibility to identify differential alternative splicing events that match a given user-defined pattern. We evaluated the performance of MATS using simulated and real RNA-Seq data sets. In the RNA-Seq analysis of alternative splicing events regulated by the epithelial-specific splicing factor ESRP1, we obtained a high RT-PCR validation rate of 86% for differential exon skipping events with a MATS FDR of <10%. Additionally, over the full list of RT-PCR tested exons, the MATS FDR estimates matched well with the experimental validation rate. Our results demonstrate that MATS is an effective and flexible approach for detecting differential alternative splicing from RNA-Seq data.
超深度 RNA 测序已成为全基因组分析前体 mRNA 可变剪接的强大方法。我们开发了 MATS(转录物剪接的多变量分析),这是一种贝叶斯统计框架,用于在 RNA-Seq 数据上灵活地测试差异可变剪接模式的假设。MATS 使用多元均匀先验来模拟外显子剪接模式之间的样本间相关性,并且使用马尔可夫链蒙特卡罗(MCMC)方法与基于模拟的自适应采样过程相结合来计算差异可变剪接的 P 值和错误发现率(FDR)。重要的是,MATS 方法适用于几乎任何类型的感兴趣的零假设,提供了识别与给定用户定义模式匹配的差异可变剪接事件的灵活性。我们使用模拟和真实的 RNA-Seq 数据集评估了 MATS 的性能。在由上皮特异性剪接因子 ESRP1 调节的可变剪接事件的 RNA-Seq 分析中,我们获得了具有 <10%FDR 的差异外显子跳跃事件的高 RT-PCR 验证率 86%。此外,在经过 RT-PCR 测试的所有外显子列表中,MATS FDR 估计与实验验证率非常吻合。我们的结果表明,MATS 是一种从 RNA-Seq 数据中检测差异可变剪接的有效且灵活的方法。