Department of Biomedical Engineering and McKusick-Nathans Department of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA.
Nat Commun. 2024 Jul 31;15(1):6464. doi: 10.1038/s41467-024-50708-z.
Gene regulatory elements drive complex biological phenomena and their mutations are associated with common human diseases. The impacts of human regulatory variants are often tested using model organisms such as mice. However, mapping human enhancers to conserved elements in mice remains a challenge, due to both rapid enhancer evolution and limitations of current computational methods. We analyze distal enhancers across 45 matched human/mouse cell/tissue pairs from a comprehensive dataset of DNase-seq experiments, and show that while cell-specific regulatory vocabulary is conserved, enhancers evolve more rapidly than promoters and CTCF binding sites. Enhancer conservation rates vary across cell types, in part explainable by tissue specific transposable element activity. We present an improved genome alignment algorithm using gapped-kmer features, called gkm-align, and make genome wide predictions for 1,401,803 orthologous regulatory elements. We show that gkm-align discovers 23,660 novel human/mouse conserved enhancers missed by previous algorithms, with strong evidence of conserved functional activity.
基因调控元件驱动着复杂的生物学现象,其突变与常见的人类疾病有关。人类调控变体的影响通常使用模型生物如老鼠来进行测试。然而,由于快速的增强子进化和当前计算方法的局限性,将人类增强子映射到老鼠的保守元件仍然是一个挑战。我们分析了来自广泛的 DNase-seq 实验数据集的 45 对匹配的人类/老鼠细胞/组织对的远端增强子,并表明虽然细胞特异性调节词汇是保守的,但增强子的进化速度比启动子和 CTCF 结合位点更快。增强子的保守率在不同的细胞类型之间存在差异,部分原因是组织特异性转座元件活性的不同。我们提出了一种使用缺口-kmer 特征的改进的基因组比对算法,称为 gkm-align,并对 1,401,803 个同源调控元件进行了全基因组预测。我们表明,gkm-align 发现了以前算法错过的 23,660 个新的人类/老鼠保守增强子,具有保守功能活性的有力证据。