McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA.
Genome Res. 2012 Nov;22(11):2290-301. doi: 10.1101/gr.139360.112. Epub 2012 Sep 27.
We take a comprehensive approach to the study of regulatory control of gene expression in melanocytes that proceeds from large-scale enhancer discovery facilitated by ChIP-seq; to rigorous validation in silico, in vitro, and in vivo; and finally to the use of machine learning to elucidate a regulatory vocabulary with genome-wide predictive power. We identify 2489 putative melanocyte enhancer loci in the mouse genome by ChIP-seq for EP300 and H3K4me1. We demonstrate that these putative enhancers are evolutionarily constrained, enriched for sequence motifs predicted to bind key melanocyte transcription factors, located near genes relevant to melanocyte biology, and capable of driving reporter gene expression in melanocytes in culture (86%; 43/50) and in transgenic zebrafish (70%; 7/10). Next, using the sequences of these putative enhancers as a training set for a supervised machine learning algorithm, we develop a vocabulary of 6-mers predictive of melanocyte enhancer function. Lastly, we demonstrate that this vocabulary has genome-wide predictive power in both the mouse and human genomes. This study provides deep insight into the regulation of gene expression in melanocytes and demonstrates a powerful approach to the investigation of regulatory sequences that can be applied to other cell types.
我们采取综合方法研究黑素细胞中基因表达的调控控制,该方法从大规模增强子发现开始,这得益于 ChIP-seq 的推动;通过计算机模拟、体外和体内进行严格验证;最后使用机器学习阐明具有全基因组预测能力的调控词汇。我们通过 ChIP-seq 为 EP300 和 H3K4me1 在小鼠基因组中鉴定了 2489 个潜在的黑素细胞增强子位点。我们证明这些潜在的增强子受到进化约束,富含预测结合关键黑素细胞转录因子的序列基序,位于与黑素细胞生物学相关的基因附近,并且能够在培养的黑素细胞(86%;43/50)和转基因斑马鱼(70%;7/10)中驱动报告基因表达。接下来,我们使用这些潜在增强子的序列作为监督机器学习算法的训练集,开发了一个 6 个碱基对的词汇,用于预测黑素细胞增强子功能。最后,我们证明该词汇在小鼠和人类基因组中具有全基因组的预测能力。这项研究深入了解了黑素细胞中基因表达的调控,并展示了一种强大的方法来研究调控序列,该方法可应用于其他细胞类型。