Department of Bioinformatics, School of Life sciences and Technology, Tongji University, 1239 Siping Road, Shanghai 20092, China.
Bioinformatics. 2012 Nov 1;28(21):2782-8. doi: 10.1093/bioinformatics/bts515. Epub 2012 Aug 24.
RNA-seq has been widely used in transcriptome analysis to effectively measure gene expression levels. Although sequencing costs are rapidly decreasing, almost 70% of all the human RNA-seq samples in the gene expression omnibus do not have biological replicates and more unreplicated RNA-seq data were published than replicated RNA-seq data in 2011. Despite the large amount of single replicate studies, there is currently no satisfactory method for detecting differentially expressed genes when only a single biological replicate is available.
We present the GFOLD (generalized fold change) algorithm to produce biologically meaningful rankings of differentially expressed genes from RNA-seq data. GFOLD assigns reliable statistics for expression changes based on the posterior distribution of log fold change. In this way, GFOLD overcomes the shortcomings of P-value and fold change calculated by existing RNA-seq analysis methods and gives more stable and biological meaningful gene rankings when only a single biological replicate is available.
The open source C/C++ program is available at http://www.tongji.edu.cn/∼zhanglab/GFOLD/index.html
RNA-seq 已广泛应用于转录组分析,可有效测量基因表达水平。尽管测序成本正在迅速降低,但基因表达综合数据库中几乎 70%的人类 RNA-seq 样本没有生物学重复,而且在 2011 年发表的未重复 RNA-seq 数据比重复 RNA-seq 数据更多。尽管有大量的单重复研究,但目前当只有一个生物学重复时,还没有令人满意的方法来检测差异表达基因。
我们提出了 GFOLD(广义倍数变化)算法,用于从 RNA-seq 数据中产生有生物学意义的差异表达基因排序。GFOLD 根据对数倍数变化的后验分布为表达变化分配可靠的统计数据。通过这种方式,GFOLD 克服了现有 RNA-seq 分析方法计算的 P 值和倍数变化的缺点,并且在只有一个生物学重复时,提供了更稳定和更有生物学意义的基因排序。
开源的 C/C++ 程序可在 http://www.tongji.edu.cn/∼zhanglab/GFOLD/index.html 获得。