Shen Shihao, Park Juw Won, Lu Zhi-xiang, Lin Lan, Henry Michael D, Wu Ying Nian, Zhou Qing, Xing Yi
Departments of Microbiology, Immunology, & Molecular Genetics and.
Departments of Molecular Physiology and Biophysics and Pathology, University of Iowa, Iowa City, IA 52242.
Proc Natl Acad Sci U S A. 2014 Dec 23;111(51):E5593-601. doi: 10.1073/pnas.1419161111. Epub 2014 Dec 5.
Ultra-deep RNA sequencing (RNA-Seq) has become a powerful approach for genome-wide analysis of pre-mRNA alternative splicing. We previously developed multivariate analysis of transcript splicing (MATS), a statistical method for detecting differential alternative splicing between two RNA-Seq samples. Here we describe a new statistical model and computer program, replicate MATS (rMATS), designed for detection of differential alternative splicing from replicate RNA-Seq data. rMATS uses a hierarchical model to simultaneously account for sampling uncertainty in individual replicates and variability among replicates. In addition to the analysis of unpaired replicates, rMATS also includes a model specifically designed for paired replicates between sample groups. The hypothesis-testing framework of rMATS is flexible and can assess the statistical significance over any user-defined magnitude of splicing change. The performance of rMATS is evaluated by the analysis of simulated and real RNA-Seq data. rMATS outperformed two existing methods for replicate RNA-Seq data in all simulation settings, and RT-PCR yielded a high validation rate (94%) in an RNA-Seq dataset of prostate cancer cell lines. Our data also provide guiding principles for designing RNA-Seq studies of alternative splicing. We demonstrate that it is essential to incorporate biological replicates in the study design. Of note, pooling RNAs or merging RNA-Seq data from multiple replicates is not an effective approach to account for variability, and the result is particularly sensitive to outliers. The rMATS source code is freely available at rnaseq-mats.sourceforge.net/. As the popularity of RNA-Seq continues to grow, we expect rMATS will be useful for studies of alternative splicing in diverse RNA-Seq projects.
超深度RNA测序(RNA-Seq)已成为全基因组分析前体mRNA可变剪接的强大方法。我们之前开发了转录本剪接多变量分析(MATS),这是一种用于检测两个RNA-Seq样本之间差异可变剪接的统计方法。在此,我们描述了一种新的统计模型和计算机程序——重复MATS(rMATS),其设计用于从重复RNA-Seq数据中检测差异可变剪接。rMATS使用分层模型来同时考虑单个重复样本中的抽样不确定性以及重复样本之间的变异性。除了分析未配对的重复样本外,rMATS还包括一个专门为样本组之间的配对重复样本设计的模型。rMATS的假设检验框架很灵活,可以评估任何用户定义的剪接变化幅度的统计显著性。通过对模拟和真实RNA-Seq数据的分析来评估rMATS的性能。在所有模拟设置中,rMATS在重复RNA-Seq数据方面均优于两种现有方法,并且在前列腺癌细胞系的RNA-Seq数据集中,逆转录聚合酶链反应(RT-PCR)的验证率很高(94%)。我们的数据还为设计可变剪接的RNA-Seq研究提供了指导原则。我们证明在研究设计中纳入生物学重复样本至关重要。值得注意的是,合并来自多个重复样本的RNA或合并RNA-Seq数据并不是考虑变异性的有效方法,并且结果对异常值特别敏感。rMATS的源代码可在rnaseq-mats.sourceforge.net/免费获取。随着RNA-Seq的普及持续增长,我们预计rMATS将对各种RNA-Seq项目中的可变剪接研究有用。