School of Computer Science, University of Manchester, Oxford Road, Manchester M13 9PL, UK.
Bioinformatics. 2012 Jul 1;28(13):1721-8. doi: 10.1093/bioinformatics/bts260. Epub 2012 May 3.
High-throughput sequencing enables expression analysis at the level of individual transcripts. The analysis of transcriptome expression levels and differential expression (DE) estimation requires a probabilistic approach to properly account for ambiguity caused by shared exons and finite read sampling as well as the intrinsic biological variance of transcript expression.
We present Bayesian inference of transcripts from sequencing data (BitSeq), a Bayesian approach for estimation of transcript expression level from RNA-seq experiments. Inferred relative expression is represented by Markov chain Monte Carlo samples from the posterior probability distribution of a generative model of the read data. We propose a novel method for DE analysis across replicates which propagates uncertainty from the sample-level model while modelling biological variance using an expression-level-dependent prior. We demonstrate the advantages of our method using simulated data as well as an RNA-seq dataset with technical and biological replication for both studied conditions.
The implementation of the transcriptome expression estimation and differential expression analysis, BitSeq, has been written in C++ and Python. The software is available online from http://code.google.com/p/bitseq/, version 0.4 was used for generating results presented in this article.
高通量测序能够在单个转录本水平上进行表达分析。转录组表达水平的分析和差异表达(DE)估计需要一种概率方法,以正确考虑共享外显子和有限的读取采样以及转录本表达的内在生物学变异性所引起的歧义。
我们提出了从测序数据推断转录本(BitSeq)的方法,这是一种用于从 RNA-seq 实验中估计转录本表达水平的贝叶斯方法。从生成读取数据的模型的后验概率分布中,通过马尔可夫链蒙特卡罗样本表示推断的相对表达。我们提出了一种用于跨重复进行 DE 分析的新方法,该方法从样本级模型传播不确定性,同时使用基于表达水平的先验来模拟生物学变异性。我们使用模拟数据以及具有技术和生物学重复的两种研究条件的 RNA-seq 数据集来证明我们方法的优势。
转录本表达估计和差异表达分析的实现,BitSeq,是用 C++ 和 Python 编写的。该软件可从 http://code.google.com/p/bitseq/ 在线获得,本文中使用的版本是 0.4。