Zhang Ray, Podtelezhnikov Alexei A, Hogenesch John B, Anafi Ron C
Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania.
Department of Genetics and Pharmacogenomics, Merck Research Laboratories, West Point, Pennsylvania.
J Biol Rhythms. 2016 Jun;31(3):244-57. doi: 10.1177/0748730416631895. Epub 2016 Mar 8.
Several tools use prior biological knowledge to interpret gene expression data. However, existing enrichment tools assume that variables are monotonic and incorrectly measure the distance between periodic phases. As a result, these tools are poorly suited for the analysis of the cell cycle, circadian clock, or other periodic systems. Here, we develop Phase Set Enrichment Analysis (PSEA) to incorporate prior knowledge into the analysis of periodic data. PSEA identifies biologically related gene sets showing temporally coordinated expression. Using synthetic gene sets of various sizes generated from von Mises (circular normal) distributions, we benchmarked PSEA alongside existing methods. PSEA offered enhanced sensitivity over a broad range of von Mises distributions and gene set sizes. Importantly, and unlike existing tools, the sensitivity of PSEA is independent of the mean expression phase of the set. We applied PSEA to 4 published datasets. Application of PSEA to the mouse circadian atlas revealed that several pathways, including those regulating immune and cell-cycle function, demonstrate temporal orchestration across multiple tissues. We then applied PSEA to the phase shifts following a restricted feeding paradigm. We found that this perturbation disrupts intraorgan metabolic synchrony in the liver, altering the timing between anabolic and catabolic pathways. Reanalysis of expression data using custom gene sets derived from recent ChIP-seq results revealed circadian transcriptional targets bound exclusively by CLOCK, independently of BMAL1, differ from other exclusive circadian output genes and have well-synchronized phases. Finally, we used PSEA to compare 2 cell-cycle datasets. PSEA increased the apparent biological overlap while also revealing evidence of cell-cycle dysregulation in these cancer cells. To encourage its use by the community, we have implemented PSEA as a Java application. In sum, PSEA offers a powerful new tool to investigate large-scale, periodic data for biological insight.
有几种工具利用先前的生物学知识来解释基因表达数据。然而,现有的富集工具假定变量是单调的,并且错误地测量了周期性阶段之间的距离。因此,这些工具不太适合分析细胞周期、生物钟或其他周期性系统。在这里,我们开发了相集富集分析(PSEA),以便将先验知识纳入周期性数据的分析中。PSEA可识别显示时间协调表达的生物学相关基因集。我们使用从冯·米塞斯(圆形正态)分布生成的各种大小的合成基因集,将PSEA与现有方法进行了基准测试。在广泛的冯·米塞斯分布和基因集大小范围内,PSEA具有更高的灵敏度。重要的是,与现有工具不同,PSEA的灵敏度与集合的平均表达阶段无关。我们将PSEA应用于4个已发表的数据集。将PSEA应用于小鼠昼夜节律图谱时发现,包括那些调节免疫和细胞周期功能的途径在内的几种途径,在多个组织中表现出时间上的协调。然后,我们将PSEA应用于限时喂养模式后的相位变化。我们发现这种扰动破坏了肝脏内器官间的代谢同步性,改变了合成代谢和分解代谢途径之间的时间安排。使用从最近的ChIP-seq结果衍生的定制基因集对表达数据进行重新分析发现,仅由CLOCK结合而独立于BMAL1的昼夜节律转录靶点,与其他专属的昼夜节律输出基因不同,并且具有良好同步的相位。最后,我们使用PSEA比较了两个细胞周期数据集。PSEA增加了明显的生物学重叠,同时也揭示了这些癌细胞中细胞周期失调的证据。为了鼓励社区使用它,我们已将PSEA实现为一个Java应用程序。总之,PSEA提供了一个强大的新工具,用于研究大规模周期性数据以获得生物学见解。