Department of Electrical and Computer Engineering, University of Washington, Seattle, WA, United States of America.
Division of Biology and Biological Engineering, Caltech, Pasadena, CA, United States of America.
PLoS One. 2020 Jun 2;15(6):e0232946. doi: 10.1371/journal.pone.0232946. eCollection 2020.
High throughput sequencing of RNA (RNA-Seq) has become a staple in modern molecular biology, with applications not only in quantifying gene expression but also in isoform-level analysis of the RNA transcripts. To enable such an isoform-level analysis, a transcriptome assembly algorithm is utilized to stitch together the observed short reads into the corresponding transcripts. This task is complicated due to the complexity of alternative splicing - a mechanism by which the same gene may generate multiple distinct RNA transcripts. We develop a novel genome-guided transcriptome assembler, RefShannon, that exploits the varying abundances of the different transcripts, in enabling an accurate reconstruction of the transcripts. Our evaluation shows RefShannon is able to improve sensitivity effectively (up to 22%) at a given specificity in comparison with other state-of-the-art assemblers. RefShannon is written in Python and is available from Github (https://github.com/shunfumao/RefShannon).
RNA 的高通量测序(RNA-Seq)已经成为现代分子生物学的一项重要技术,不仅可用于定量基因表达,还可用于 RNA 转录本的异构体水平分析。为了实现这种异构体水平的分析,需要使用转录组组装算法将观察到的短读段拼接成相应的转录本。由于选择性剪接的复杂性,这一任务变得复杂,选择性剪接是一种机制,通过这种机制,同一个基因可以产生多个不同的 RNA 转录本。我们开发了一种新颖的基于基因组指导的转录组组装程序 RefShannon,它利用不同转录本的丰度变化,实现对转录本的准确重建。我们的评估表明,与其他最先进的组装程序相比,RefShannon 在给定特异性的情况下,能够有效地提高灵敏度(高达 22%)。RefShannon 是用 Python 编写的,可以从 Github 上获得(https://github.com/shunfumao/RefShannon)。