Moulos Panagiotis, Hatzis Pantelis
Biomedical Sciences Research Center 'Alexander Fleming', 34 Fleming str, 16672, Vari, Greece
Nucleic Acids Res. 2015 Feb 27;43(4):e25. doi: 10.1093/nar/gku1273. Epub 2014 Dec 1.
RNA-Seq is gradually becoming the standard tool for transcriptomic expression studies in biological research. Although considerable progress has been recorded in the development of statistical algorithms for the detection of differentially expressed genes using RNA-Seq data, the list of detected genes can differ significantly between algorithms. We present a new method (PANDORA) that combines multiple algorithms toward a summarized result, more efficiently reflecting true experimental outcomes. This is achieved through the systematic combination of several analysis algorithms, by weighting their outcomes according to their performance with realistically simulated data sets generated from real data. Results supported by the analysis of both simulated and real data from different organisms as well as correlation with PolII occupancy demonstrate that PANDORA improves the detection of differential expression. It accomplishes this by optimizing the tradeoff between standard performance measurements, such as precision and sensitivity.
RNA测序正逐渐成为生物学研究中转录组表达研究的标准工具。尽管在开发使用RNA测序数据检测差异表达基因的统计算法方面已取得了显著进展,但不同算法检测到的基因列表可能存在显著差异。我们提出了一种新方法(PANDORA),该方法结合多种算法以得出汇总结果,能更有效地反映真实的实验结果。这是通过系统地组合几种分析算法来实现的,即根据它们在由真实数据生成的真实模拟数据集上的表现对其结果进行加权。对来自不同生物体的模拟数据和真实数据的分析结果以及与PolII占有率的相关性表明,PANDORA改进了差异表达的检测。它通过优化标准性能指标(如精度和灵敏度)之间的权衡来实现这一点。