Univ. Grenoble Alpes, INSERM, CEA-IRIG, Biomics, Grenoble, France.
Center for Computational Biology, Mines ParisTech, PSL Research University, Paris, France.
PLoS Comput Biol. 2023 Mar 9;19(3):e1010342. doi: 10.1371/journal.pcbi.1010342. eCollection 2023 Mar.
The majority of gene expression studies focus on the search for genes whose mean expression is different between two or more populations of samples in the so-called "differential expression analysis" approach. However, a difference in variance in gene expression may also be biologically and physiologically relevant. In the classical statistical model used to analyze RNA-sequencing (RNA-seq) data, the dispersion, which defines the variance, is only considered as a parameter to be estimated prior to identifying a difference in mean expression between conditions of interest. Here, we propose to evaluate four recently published methods, which detect differences in both the mean and dispersion in RNA-seq data. We thoroughly investigated the performance of these methods on simulated datasets and characterized parameter settings to reliably detect genes with a differential expression dispersion. We applied these methods to The Cancer Genome Atlas datasets. Interestingly, among the genes with an increased expression dispersion in tumors and without a change in mean expression, we identified some key cellular functions, most of which were related to catabolism and were overrepresented in most of the analyzed cancers. In particular, our results highlight autophagy, whose role in cancerogenesis is context-dependent, illustrating the potential of the differential dispersion approach to gain new insights into biological processes and to discover new biomarkers.
大多数基因表达研究都集中在寻找两个或多个样本群体之间平均表达存在差异的基因,这种方法被称为“差异表达分析”。然而,基因表达方差的差异也可能具有生物学和生理学意义。在用于分析 RNA 测序(RNA-seq)数据的经典统计模型中,仅将分散度(定义方差)视为在确定感兴趣条件下的平均表达差异之前要估计的参数。在这里,我们提出了评估四种最近发表的方法,这些方法可以检测 RNA-seq 数据中均值和分散度的差异。我们深入研究了这些方法在模拟数据集上的性能,并确定了可靠检测具有差异表达分散度的基因的参数设置。我们将这些方法应用于癌症基因组图谱数据集。有趣的是,在肿瘤中表达分散度增加但平均表达没有变化的基因中,我们鉴定出了一些关键的细胞功能,其中大多数与分解代谢有关,并且在大多数分析的癌症中都有过表达。特别是,我们的结果强调了自噬,自噬在癌症发生中的作用是依赖于上下文的,这说明了差异分散方法在深入了解生物学过程和发现新的生物标志物方面的潜力。