BSRC Alexander Fleming.
Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa156.
The study of differential gene expression patterns through RNA-Seq comprises a routine task in the daily lives of molecular bioscientists, who produce vast amounts of data requiring proper management and analysis. Despite widespread use, there are still no widely accepted golden standards for the normalization and statistical analysis of RNA-Seq data, and critical biases, such as gene lengths and problems in the detection of certain types of molecules, remain largely unaddressed. Stimulated by these unmet needs and the lack of in-depth research into the potential of combinatorial methods to enhance the analysis of differential gene expression, we had previously introduced the PANDORA P-value combination algorithm while presenting evidence for PANDORA's superior performance in optimizing the tradeoff between precision and sensitivity. In this article, we present the next generation of the algorithm along with a more in-depth investigation of its capabilities to effectively analyze RNA-Seq data. In particular, we show that PANDORA-reported lists of differentially expressed genes are unaffected by biases introduced by different normalization methods, while, at the same time, they comprise a reliable input option for downstream pathway analysis. Additionally, PANDORA outperforms other methods in detecting differential expression patterns in certain transcript types, including long non-coding RNAs.
通过 RNA-Seq 研究差异基因表达模式是分子生物学家日常工作中的一项常规任务,他们产生了大量需要妥善管理和分析的数据。尽管 RNA-Seq 数据的标准化和统计分析方法已经得到了广泛的应用,但仍然没有被广泛接受的黄金标准,而且关键的偏差,如基因长度和某些类型分子检测的问题,在很大程度上仍未得到解决。受这些未满足的需求以及缺乏对组合方法在增强差异基因表达分析方面的潜力的深入研究的刺激,我们之前在介绍 PANDORA P 值组合算法的同时,还提出了证据证明 PANDORA 在优化精度和灵敏度之间的权衡方面具有优越的性能。在本文中,我们介绍了该算法的下一代,并更深入地研究了其有效分析 RNA-Seq 数据的能力。特别是,我们表明,PANDORA 报告的差异表达基因列表不受不同归一化方法引入的偏差的影响,同时,它还是下游通路分析的可靠输入选项。此外,PANDORA 在检测某些转录类型(包括长非编码 RNA)的差异表达模式方面表现优于其他方法。