Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA.
Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA.
Bioinformatics. 2021 May 5;37(5):650-658. doi: 10.1093/bioinformatics/btaa852.
High-throughput RNA sequencing has revolutionized the scope and depth of transcriptome analysis. Accurate reconstruction of a phenotype-specific transcriptome is challenging due to the noise and variability of RNA-seq data. This requires computational identification of transcripts from multiple samples of the same phenotype, given the underlying consensus transcript structure.
We present a Bayesian method, integrated assembly of phenotype-specific transcripts (IntAPT), that identifies phenotype-specific isoforms from multiple RNA-seq profiles. IntAPT features a novel two-layer Bayesian model to capture the presence of isoforms at the group layer and to quantify the abundance of isoforms at the sample layer. A spike-and-slab prior is used to model the isoform expression and to enforce the sparsity of expressed isoforms. Dependencies between the existence of isoforms and their expression are modeled explicitly to facilitate parameter estimation. Model parameters are estimated iteratively using Gibbs sampling to infer the joint posterior distribution, from which the presence and abundance of isoforms can reliably be determined. Studies using both simulations and real datasets show that IntAPT consistently outperforms existing methods for the IntAPT. Experimental results demonstrate that, despite sequencing errors, IntAPT exhibits a robust performance among multiple samples, resulting in notably improved identification of expressed isoforms of low abundance.
The IntAPT package is available at http://github.com/henryxushi/IntAPT.
Supplementary data are available at Bioinformatics online.
高通量 RNA 测序技术彻底改变了转录组分析的范围和深度。由于 RNA-seq 数据的噪声和可变性,准确重建表型特异性转录组具有挑战性。这需要在给定潜在共识转录结构的情况下,从同一表型的多个样本中计算识别转录本。
我们提出了一种贝叶斯方法,即集成表型特异性转录本组装(IntAPT),该方法可从多个 RNA-seq 图谱中识别表型特异性异构体。IntAPT 具有新颖的两层贝叶斯模型,可在组层捕获异构体的存在,并在样本层量化异构体的丰度。使用 Spike-and-slab 先验来对异构体表达进行建模,并强制表达异构体的稀疏性。明确建模异构体的存在与其表达之间的依赖性,以促进参数估计。使用 Gibbs 抽样迭代估计模型参数,以推断联合后验分布,从中可以可靠地确定异构体的存在和丰度。使用模拟和真实数据集的研究表明,IntAPT 在 IntAPT 中始终优于现有方法。实验结果表明,尽管存在测序错误,但 IntAPT 在多个样本中表现稳健,从而显著提高了低丰度表达异构体的识别能力。
IntAPT 软件包可在 http://github.com/henryxushi/IntAPT 上获得。
补充数据可在生物信息学在线获得。