Hardcastle Thomas J
Department of Plant Sciences, University of Cambridge, Cambridge CB2 3EA, UK.
Bioinformatics. 2016 Jan 15;32(2):195-202. doi: 10.1093/bioinformatics/btv569. Epub 2015 Oct 1.
High-throughput data are now commonplace in biological research. Rapidly changing technologies and application mean that novel methods for detecting differential behaviour that account for a 'large P, small n' setting are required at an increasing rate. The development of such methods is, in general, being done on an ad hoc basis, requiring further development cycles and a lack of standardization between analyses.
We present here a generalized method for identifying differential behaviour within high-throughput biological data through empirical Bayesian methods. This approach is based on our baySeq algorithm for identification of differential expression in RNA-seq data based on a negative binomial distribution, and in paired data based on a beta-binomial distribution. Here we show how the same empirical Bayesian approach can be applied to any parametric distribution, removing the need for lengthy development of novel methods for differently distributed data. Comparisons with existing methods developed to address specific problems in high-throughput biological data show that these generic methods can achieve equivalent or better performance. A number of enhancements to the basic algorithm are also presented to increase flexibility and reduce computational costs.
The methods are implemented in the R baySeq (v2) package, available on Bioconductor http://www.bioconductor.org/packages/release/bioc/html/baySeq.html.
Supplementary data are available at Bioinformatics online.
高通量数据如今在生物学研究中已很常见。技术和应用的快速变化意味着,越来越需要能够处理“大P,小n”情况的检测差异行为的新方法。一般而言,此类方法是临时开发的,需要进一步的开发周期,且分析之间缺乏标准化。
我们在此提出一种通过经验贝叶斯方法在高通量生物学数据中识别差异行为的通用方法。该方法基于我们的baySeq算法,该算法基于负二项分布在RNA测序数据中识别差异表达,并基于β-二项分布在配对数据中识别差异表达。在这里,我们展示了相同的经验贝叶斯方法如何应用于任何参数分布,从而无需为不同分布的数据冗长地开发新方法。与为解决高通量生物学数据中的特定问题而开发的现有方法的比较表明,这些通用方法可以实现同等或更好的性能。还提出了对基本算法的一些改进,以增加灵活性并降低计算成本。
这些方法在R语言的baySeq(v2)包中实现,可在Bioconductor上获取,网址为http://www.bioconductor.org/packages/release/bioc/html/baySeq.html。
补充数据可在《生物信息学》在线获取。