Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA.
Department of Pathology, Johns Hopkins Medical Institutions, Baltimore, MD 21231, USA.
Bioinformatics. 2018 Jan 1;34(1):56-63. doi: 10.1093/bioinformatics/btx557.
Recent advances in high-throughput RNA sequencing (RNA-seq) technologies have made it possible to reconstruct the full transcriptome of various types of cells. It is important to accurately assemble transcripts or identify isoforms for an improved understanding of molecular mechanisms in biological systems.
We have developed a novel Bayesian method, SparseIso, to reliably identify spliced isoforms from RNA-seq data. A spike-and-slab prior is incorporated into the Bayesian model to enforce the sparsity for isoform identification, effectively alleviating the problem of overfitting. A Gibbs sampling procedure is further developed to simultaneously identify and quantify transcripts from RNA-seq data. With the sampling approach, SparseIso estimates the joint distribution of all candidate transcripts, resulting in a significantly improved performance in detecting lowly expressed transcripts and multiple expressed isoforms of genes. Both simulation study and real data analysis have demonstrated that the proposed SparseIso method significantly outperforms existing methods for improved transcript assembly and isoform identification.
The SparseIso package is available at http://github.com/henryxushi/SparseIso.
Supplementary data are available at Bioinformatics online.
高通量 RNA 测序 (RNA-seq) 技术的最新进展使得重建各种类型细胞的完整转录组成为可能。准确组装转录本或识别异构体对于深入了解生物系统中的分子机制非常重要。
我们开发了一种新颖的贝叶斯方法 SparseIso,可从 RNA-seq 数据中可靠地识别拼接异构体。在贝叶斯模型中引入了一个 Spike-and-Slab 先验,以强制异构体识别的稀疏性,有效地解决了过拟合问题。进一步开发了 Gibbs 采样过程,以从 RNA-seq 数据中同时识别和定量转录本。通过采样方法,SparseIso 估计了所有候选转录本的联合分布,从而在检测低表达转录本和基因的多个表达异构体方面显著提高了性能。模拟研究和真实数据分析都表明,所提出的 SparseIso 方法在提高转录本组装和异构体识别方面明显优于现有方法。
SparseIso 包可在 http://github.com/henryxushi/SparseIso 上获得。
补充数据可在 Bioinformatics 在线获得。