Department of Plant Sciences, University of Cambridge, Downing Street, Cambridge, CB2 3EA, UK.
BMC Bioinformatics. 2013 Apr 23;14:135. doi: 10.1186/1471-2105-14-135.
Pairing of samples arises naturally in many genomic experiments; for example, gene expression in tumour and normal tissue from the same patients. Methods for analysing high-throughput sequencing data from such experiments are required to identify differential expression, both within paired samples and between pairs under different experimental conditions.
We develop an empirical Bayesian method based on the beta-binomial distribution to model paired data from high-throughput sequencing experiments. We examine the performance of this method on simulated and real data in a variety of scenarios. Our methods are implemented as part of the RbaySeq package (versions 1.11.6 and greater) available from Bioconductor (http://www.bioconductor.org).
We compare our approach to alternatives based on generalised linear modelling approaches and show that our method offers significant gains in performance on simulated data. In testing on real data from oral squamous cell carcinoma patients, we discover greater enrichment of previously identified head and neck squamous cell carcinoma associated gene sets than has previously been achieved through a generalised linear modelling approach, suggesting that similar gains in performance may be found in real data. Our methods thus show real and substantial improvements in analyses of high-throughput sequencing data from paired samples.
在许多基因组实验中,样本配对自然会出现;例如,来自同一患者的肿瘤和正常组织中的基因表达。需要针对此类实验的高通量测序数据开发分析方法,以识别配对样本内和不同实验条件下的样本对之间的差异表达。
我们开发了一种基于贝塔二项式分布的经验贝叶斯方法来对高通量测序实验中的配对数据进行建模。我们在各种场景下模拟和真实数据上检验了该方法的性能。我们的方法作为 Bioconductor(http://www.bioconductor.org)上可用的 RbaySeq 包(版本 1.11.6 及更高版本)的一部分实现。
我们将我们的方法与基于广义线性模型方法的替代方法进行了比较,并表明我们的方法在模拟数据上的性能有显著提高。在对口腔鳞状细胞癌患者的真实数据进行测试时,我们发现与以前确定的头颈部鳞状细胞癌相关基因集的富集程度比通过广义线性模型方法以前实现的要高,这表明在真实数据中可能会发现类似的性能提高。因此,我们的方法在分析高通量测序数据的配对样本方面显示出了真实而实质性的改进。