Division of Biostatistics, Dan L. Duncan Cancer Center, Baylor College of Medicine, Houston, Texas, USA.
PLoS One. 2010 Jan 8;5(1):e8529. doi: 10.1371/journal.pone.0008529.
Deep sequencing of transcriptome (RNA-seq) provides unprecedented opportunity to interrogate plausible mRNA splicing patterns by mapping RNA-seq reads to exon junctions (thereafter junction reads). In most previous studies, exon junctions were detected by using the quantitative information of junction reads. The quantitative criterion (e.g. minimum of two junction reads), although is straightforward and widely used, usually results in high false positive and false negative rates, owning to the complexity of transcriptome. Here, we introduced a new metric, namely Minimal Match on Either Side of exon junction (MMES), to measure the quality of each junction read, and subsequently implemented an empirical statistical model to detect exon junctions. When applied to a large dataset (>200M reads) consisting of mouse brain, liver and muscle mRNA sequences, and using independent transcripts databases as positive control, our method was proved to be considerably more accurate than previous ones, especially for detecting junctions originated from low-abundance transcripts. Our results were also confirmed by real time RT-PCR assay. The MMES metric can be used either in this empirical statistical model or in other more sophisticated classifiers, such as logistic regression.
转录组深度测序(RNA-seq)通过将 RNA-seq 读取映射到外显子接头(此后称为接头读取),为探究合理的 mRNA 剪接模式提供了前所未有的机会。在大多数先前的研究中,通过使用接头读取的定量信息来检测外显子接头。尽管定量标准(例如至少两个接头读取)简单明了且应用广泛,但由于转录组的复杂性,通常会导致高假阳性和假阴性率。在这里,我们引入了一种新的度量标准,即外显子接头任一侧的最小匹配(MMES),用于测量每个接头读取的质量,随后实现了一个经验统计模型来检测外显子接头。当应用于包含小鼠脑、肝和肌肉 mRNA 序列的大型数据集(>200M 个读取),并使用独立的转录本数据库作为阳性对照时,我们的方法被证明比以前的方法更准确,特别是对于检测来自低丰度转录本的接头。我们的结果也通过实时 RT-PCR 测定得到了证实。MMES 度量标准可以用于该经验统计模型或其他更复杂的分类器,例如逻辑回归。