Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720.
Center for RNA Systems Biology, University of California, Berkeley, CA 94720.
Proc Natl Acad Sci U S A. 2018 Aug 28;115(35):E8181-E8190. doi: 10.1073/pnas.1806018115. Epub 2018 Aug 13.
Alternative pre-mRNA splicing (AS) greatly diversifies metazoan transcriptomes and proteomes and is crucial for gene regulation. Current computational analysis methods of AS from Illumina RNA-sequencing data rely on preannotated libraries of known spliced transcripts, which hinders AS analysis with poorly annotated genomes and can further mask unknown AS patterns. To address this critical bioinformatics problem, we developed a method called the junction usage model (JUM) that uses a bottom-up approach to identify, analyze, and quantitate global AS profiles without any prior transcriptome annotations. JUM accurately reports global AS changes in terms of the five conventional AS patterns and an additional "composite" category composed of inseparable combinations of conventional patterns. JUM stringently classifies the difficult and disease-relevant pattern of intron retention (IR), reducing the false positive rate of IR detection commonly seen in other annotation-based methods to near-negligible rates. When analyzing AS in RNA samples derived from heads, human tumors, and human cell lines bearing cancer-associated splicing factor mutations, JUM consistently identified approximately twice the number of novel AS events missed by other methods. Computational simulations showed JUM exhibits a 1.2 to 4.8 times higher true positive rate at a fixed cutoff of 5% false discovery rate. In summary, JUM provides a framework and improved method that removes the necessity for transcriptome annotations and enables the detection, analysis, and quantification of AS patterns in complex metazoan transcriptomes with superior accuracy.
可变剪接(AS)极大地丰富了后生动物的转录组和蛋白质组,对基因调控至关重要。目前,基于 Illumina RNA-seq 数据的 AS 计算分析方法依赖于已知剪接转录本的预注释文库,这阻碍了对注释较差的基因组的 AS 分析,并可能进一步掩盖未知的 AS 模式。为了解决这个关键的生物信息学问题,我们开发了一种称为连接使用模型(JUM)的方法,该方法采用自下而上的方法,在没有任何转录组注释的情况下识别、分析和量化全局 AS 谱。JUM 准确地报告了五个常规 AS 模式和一个由不可分割的常规模式组合而成的额外“复合”类别中的全局 AS 变化。JUM 严格地对内含子保留(IR)这种困难且与疾病相关的模式进行分类,将其他基于注释的方法中常见的 IR 检测假阳性率降低到接近可以忽略不计的水平。当分析来自头部、人类肿瘤和携带与癌症相关剪接因子突变的人类细胞系的 RNA 样本中的 AS 时,JUM 一致地识别出了其他方法错过的大约两倍的新 AS 事件。计算模拟表明,在固定的 5%假发现率截止值下,JUM 的真阳性率是其他方法的 1.2 到 4.8 倍。总之,JUM 提供了一个框架和改进的方法,去除了对转录组注释的需求,并以更高的准确性实现了复杂后生动物转录组中 AS 模式的检测、分析和量化。