Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY.
Physiology, Biophysics, and Systems Biology, Weill Cornell Medical College, New York, NY.
Mol Biol Evol. 2020 Jul 1;37(7):2137-2152. doi: 10.1093/molbev/msaa073.
Evolutionary changes in gene expression are often driven by gains and losses of cis-regulatory elements (CREs). The dynamics of CRE evolution can be examined using multispecies epigenomic data, but so far such analyses have generally been descriptive and model-free. Here, we introduce a probabilistic modeling framework for the evolution of CREs that operates directly on raw chromatin immunoprecipitation and sequencing (ChIP-seq) data and fully considers the phylogenetic relationships among species. Our framework includes a phylogenetic hidden Markov model, called epiPhyloHMM, for identifying the locations of multiply aligned CREs, and a combined phylogenetic and generalized linear model, called phyloGLM, for accounting for the influence of a rich set of genomic features in describing their evolutionary dynamics. We apply these methods to previously published ChIP-seq data for the H3K4me3 and H3K27ac histone modifications in liver tissue from nine mammals. We find that enhancers are gained and lost during mammalian evolution at about twice the rate of promoters, and that turnover rates are negatively correlated with DNA sequence conservation, expression level, and tissue breadth, and positively correlated with distance from the transcription start site, consistent with previous findings. In addition, we find that the predicted dosage sensitivity of target genes positively correlates with DNA sequence constraint in CREs but not with turnover rates, perhaps owing to differences in the effect sizes of the relevant mutations. Altogether, our probabilistic modeling framework enables a variety of powerful new analyses.
基因表达的进化变化通常是由顺式调控元件(CREs)的获得和丧失驱动的。可以使用多物种表观基因组学数据来研究 CRE 进化的动态,但到目前为止,这种分析通常是描述性的,没有模型。在这里,我们引入了一个用于 CRE 进化的概率建模框架,该框架可以直接在原始染色质免疫沉淀和测序(ChIP-seq)数据上运行,并充分考虑物种之间的系统发育关系。我们的框架包括一个称为 epiPhyloHMM 的系统发育隐马尔可夫模型,用于识别多重对齐的 CRE 位置,以及一个组合的系统发育和广义线性模型,称为 phyloGLM,用于在描述其进化动态时考虑丰富的基因组特征。我们将这些方法应用于先前发表的来自 9 种哺乳动物肝脏组织的 H3K4me3 和 H3K27ac 组蛋白修饰的 ChIP-seq 数据。我们发现,在哺乳动物进化过程中,增强子的获得和丧失率大约是启动子的两倍,而周转率与 DNA 序列保守性、表达水平和组织广度呈负相关,与转录起始位点的距离呈正相关,这与先前的发现一致。此外,我们发现靶基因的预测剂量敏感性与 CRE 中的 DNA 序列约束呈正相关,但与周转率无关,这可能是由于相关突变的效应大小不同。总的来说,我们的概率建模框架能够进行各种强大的新分析。