Papastamoulis Panagiotis, Rattray Magnus
University of Manchester UK.
J R Stat Soc Ser C Appl Stat. 2018 Jan;67(1):3-23. doi: 10.1111/rssc.12213. Epub 2017 Feb 7.
Recent advances in molecular biology allow the quantification of the transcriptome and scoring transcripts as differentially or equally expressed between two biological conditions. Although these two tasks are closely linked, the available inference methods treat them separately: a primary model is used to estimate expression and its output is post processed by using a differential expression model. In the paper, both issues are simultaneously addressed by proposing the joint estimation of expression levels and differential expression: the unknown relative abundance of each transcript can either be equal or not between two conditions. A hierarchical Bayesian model builds on the BitSeq framework and the posterior distribution of transcript expression and differential expression is inferred by using Markov chain Monte Carlo sampling. It is shown that the model proposed enjoys conjugacy for fixed dimension variables; thus the full conditional distributions are analytically derived. Two samplers are constructed, a reversible jump Markov chain Monte Carlo sampler and a collapsed Gibbs sampler, and the latter is found to perform better. A cluster representation of the aligned reads to the transcriptome is introduced, allowing parallel estimation of the marginal posterior distribution of subsets of transcripts under reasonable computing time. Under a fixed prior probability of differential expression the clusterwise sampler has the same marginal posterior distributions as the raw sampler, but a more general prior structure is also employed. The algorithm proposed is benchmarked against alternative methods by using synthetic data sets and applied to real RNA sequencing data. Source code is available on line from https://github.com/mqbssppe/cjBitSeq.
分子生物学的最新进展使得对转录组进行定量分析,并对转录本在两种生物学条件下的差异表达或等量表达进行评分成为可能。尽管这两项任务紧密相关,但现有的推理方法将它们分开处理:使用一个主模型来估计表达量,其输出结果再通过差异表达模型进行后处理。在本文中,通过提出对表达水平和差异表达的联合估计,同时解决了这两个问题:每个转录本在两种条件下的未知相对丰度可能相等,也可能不相等。一个分层贝叶斯模型建立在BitSeq框架之上,通过马尔可夫链蒙特卡罗采样来推断转录本表达和差异表达的后验分布。结果表明,所提出的模型对于固定维度变量具有共轭性;因此可以解析推导完整的条件分布。构建了两个采样器,一个可逆跳跃马尔可夫链蒙特卡罗采样器和一个塌缩吉布斯采样器,发现后者性能更好。引入了比对到转录组的 reads 的聚类表示,使得在合理的计算时间内能够并行估计转录本子集的边际后验分布。在差异表达的固定先验概率下,聚类采样器与原始采样器具有相同的边际后验分布,但也采用了更一般的先验结构。所提出的算法通过使用合成数据集与其他方法进行基准测试,并应用于真实的 RNA 测序数据。源代码可从 https://github.com/mqbssppe/cjBitSeq 在线获取。