Kamp Thomas, Adams Micah, Disselkoen Craig, Tintle Nathan
Department of Mathematics, Statistics, and Computer Science, Dordt College Sioux Center, IA 51250, USA,
Pac Symp Biocomput. 2017;22:449-460. doi: 10.1142/9789813207813_0042.
Gene set analysis methods continue to be a popular and powerful method of evaluating genome-wide transcriptomics data. These approach require a priori grouping of genes into biologically meaningful sets, and then conducting downstream analyses at the set (instead of gene) level of analysis. Gene set analysis methods have been shown to yield more powerful statistical conclusions than single-gene analyses due to both reduced multiple testing penalties and potentially larger observed effects due to the aggregation of effects across multiple genes in the set. Traditionally, gene set analysis methods have been applied directly to normalized, log-transformed, transcriptomics data. Recently, efforts have been made to transform transcriptomics data to scales yielding more biologically interpretable results. For example, recently proposed models transform log-transformed transcriptomics data to a confidence metric (ranging between 0 and 100%) that a gene is active (roughly speaking, that the gene product is part of an active cellular mechanism). In this manuscript, we demonstrate, on both real and simulated transcriptomics data, that tests for differential expression between sets of genes using are typically more powerful when using gene activity state estimates as opposed to log-transformed gene expression data. Our analysis suggests further exploration of techniques to transform transcriptomics data to meaningful quantities for improved downstream inference.
基因集分析方法仍然是评估全基因组转录组学数据的一种流行且强大的方法。这些方法需要将基因预先分组为具有生物学意义的集合,然后在集合(而非基因)层面进行下游分析。由于减少了多重检验惩罚,并且由于集合中多个基因的效应聚集可能导致观察到的效应更大,基因集分析方法已被证明比单基因分析能得出更有力的统计结论。传统上,基因集分析方法直接应用于标准化的、对数转换后的转录组学数据。最近,人们努力将转录组学数据转换到能产生更具生物学可解释性结果的尺度。例如,最近提出的模型将对数转换后的转录组学数据转换为一个基因处于活跃状态的置信度指标(范围在0到100%之间)(大致来说,即基因产物是活跃细胞机制的一部分)。在本论文中,我们在真实和模拟的转录组学数据上都证明,与使用对数转换后的基因表达数据相比,使用基因活性状态估计来检测基因集之间的差异表达通常更具效力。我们的分析表明,需要进一步探索将转录组学数据转换为有意义的量的技术,以改进下游推断。