School of Biological Sciences, Institute of Bioengineering and Bioscience, Georgia Institute of Technology, Atlanta, GA, United States of America.
College of Nursing, The Research Institute of Nursing Science, Seoul National University, Seoul, Korea.
PLoS Comput Biol. 2018 Oct 5;14(10):e1006451. doi: 10.1371/journal.pcbi.1006451. eCollection 2018 Oct.
Recent advances in epigenomics have made it possible to map genome-wide regulatory regions using empirical methods. Subsequent comparative epigenomic studies have revealed that regulatory regions diverge rapidly between genome of different species, and that the divergence is more pronounced in enhancers than in promoters. To understand genomic changes underlying these patterns, we investigated if we can identify specific sequence fragments that are over-enriched in regulatory regions, thus potentially contributing to regulatory functions of such regions. Here we report numerous sequence fragments that are statistically over-enriched in enhancers and promoters of different mammals (which we refer to as 'sequence determinants'). Interestingly, the degree of statistical enrichment, which presumably is associated with the degree of regulatory impacts of the specific sequence determinant, was significantly higher for promoter sequence determinants than enhancer sequence determinants. We further used a machine learning method to construct prediction models using sequence determinants. Remarkably, prediction models constructed from one species could be used to predict regulatory regions of other species with high accuracy. This observation indicates that even though the precise locations of regulatory regions diverge rapidly during evolution, the functional potential of sequence determinants underlying regulatory sequences may be conserved between species.
近年来,表观基因组学的进展使得使用经验方法绘制全基因组调控区域成为可能。随后的比较表观基因组学研究表明,不同物种基因组之间的调控区域迅速分化,而增强子中的分化比启动子更为明显。为了了解这些模式背后的基因组变化,我们研究了是否可以识别在调控区域中过度富集的特定序列片段,从而可能为这些区域的调控功能做出贡献。在这里,我们报告了许多在不同哺乳动物的增强子和启动子中统计上过度富集的序列片段(我们称之为“序列决定因素”)。有趣的是,假定与特定序列决定因素的调控影响程度相关的统计富集程度,对于启动子序列决定因素而言显著高于增强子序列决定因素。我们进一步使用机器学习方法使用序列决定因素构建预测模型。值得注意的是,来自一个物种的预测模型可以用于高精度地预测其他物种的调控区域。这一观察结果表明,尽管在进化过程中调控区域的精确位置迅速分化,但调控序列中序列决定因素的功能潜力可能在物种之间是保守的。