Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland.
Institute of Biomedicine, University of Turku, Turku, Finland.
RNA Biol. 2021 Nov;18(11):1739-1746. doi: 10.1080/15476286.2020.1868151. Epub 2021 Jan 30.
Detection of differentially expressed genes (DEGs) between different biological conditions is a key data analysis step of most RNA-sequencing studies. Conventionally, computational tools have used gene-level read counts as input to test for differential gene expression between sample condition groups. Recently, it has been suggested that statistical testing could be performed with increased power at a lower feature level prior to aggregating the results to the gene level. In this study, we systematically compared the performance of calling the DEGs when using read count data at different levels (gene, transcript, and exon) as input, in the context of two publicly available data sets. Additionally, we tested two different methods for aggregating the lower feature-level p-values to gene-level: Lancaster and empirical Brown's method. Our results show that detection of DEGs is improved compared to the conventional gene-level approach regardless of the lower feature-level used for statistical testing. The overall best balance between accuracy and false discovery rate was obtained using the exon-level approach with empirical Brown's aggregation method, which we provide as a freely available Bioconductor package EBSEA (https://bioconductor.org/packages/release/bioc/html/EBSEA.html).
检测不同生物学条件之间的差异表达基因(DEGs)是大多数 RNA-seq 研究中关键的数据分析步骤。传统上,计算工具使用基因水平的读取计数作为输入,以测试样本条件组之间的差异基因表达。最近,有人提出可以在将结果聚合到基因水平之前,以较低的特征水平进行统计检验,从而提高统计检验的功效。在这项研究中,我们系统地比较了在两个公开可用的数据集的背景下,使用不同水平(基因、转录本和外显子)的读取计数数据作为输入调用 DEGs 时的性能。此外,我们还测试了两种不同的方法将较低特征级别的 p 值聚合到基因级:Lancaster 和经验 Brown 方法。我们的结果表明,与传统的基因水平方法相比,无论使用哪种较低的特征水平进行统计检验,都可以提高 DEGs 的检测性能。使用经验 Brown 聚合方法的外显子水平方法获得了准确性和假发现率之间的最佳平衡,我们将其作为免费提供的 Bioconductor 包 EBSEA(https://bioconductor.org/packages/release/bioc/html/EBSEA.html)提供。