Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
Department of Computer Science, University of Virginia, Charlottesville, VA 22904, USA.
Cell Syst. 2018 Aug 22;7(2):208-218.e11. doi: 10.1016/j.cels.2018.05.022. Epub 2018 Jun 20.
A large amount of multi-species functional genomic data from high-throughput assays are becoming available to help understand the molecular mechanisms for phenotypic diversity across species. However, continuous-trait probabilistic models, which are key to such comparative analysis, remain under-explored. Here we develop a new model, called phylogenetic hidden Markov Gaussian processes (Phylo-HMGP), to simultaneously infer heterogeneous evolutionary states of functional genomic features in a genome-wide manner. Both simulation studies and real data application demonstrate the effectiveness of Phylo-HMGP. Importantly, we applied Phylo-HMGP to analyze a new cross-species DNA replication timing (RT) dataset from the same cell type in five primate species (human, chimpanzee, orangutan, gibbon, and green monkey). We demonstrate that our Phylo-HMGP model enables discovery of genomic regions with distinct evolutionary patterns of RT. Our method provides a generic framework for comparative analysis of multi-species continuous functional genomic signals to help reveal regions with conserved or lineage-specific regulatory roles.
大量来自高通量测定的多物种功能基因组数据正被用于帮助理解跨物种表型多样性的分子机制。然而,对于这种比较分析至关重要的连续性状概率模型仍未得到充分探索。在这里,我们开发了一种新模型,称为系统发育隐马尔可夫高斯过程(Phylo-HMGP),用于在全基因组范围内同时推断功能基因组特征的异构进化状态。模拟研究和实际数据应用都证明了 Phylo-HMGP 的有效性。重要的是,我们将 Phylo-HMGP 应用于分析来自同一细胞类型的五个灵长类物种(人类、黑猩猩、猩猩、长臂猿和绿猴)的新的跨物种 DNA 复制时间(RT)数据集。我们证明,我们的 Phylo-HMGP 模型能够发现具有不同 RT 进化模式的基因组区域。我们的方法为比较分析多物种连续功能基因组信号提供了一个通用框架,有助于揭示具有保守或谱系特异性调控作用的区域。