Department of Biochemistry and Molecular Biology, Center for Comparative Genomics and Bioinformatics, 304 Wartik Laboratory, The Pennsylvania State University, University Park, Pennsylvania 16802, USA.
Nat Rev Genet. 2012 Jun 18;13(7):469-83. doi: 10.1038/nrg3242.
Differential gene expression is the fundamental mechanism underlying animal development and cell differentiation. However, it is a challenge to identify comprehensively and accurately the DNA sequences that are required to regulate gene expression: namely, cis-regulatory modules (CRMs). Three major features, either singly or in combination, are used to predict CRMs: clusters of transcription factor binding site motifs, non-coding DNA that is under evolutionary constraint and biochemical marks associated with CRMs, such as histone modifications and protein occupancy. The validation rates for predictions indicate that identifying diagnostic biochemical marks is the most reliable method, and understanding is enhanced by the analysis of motifs and conservation patterns within those predicted CRMs.
差异基因表达是动物发育和细胞分化的基础机制。然而,全面准确地识别调控基因表达所需的 DNA 序列(即顺式调控模块(CRMs))是一项挑战。预测 CRMs 主要使用三个特征,单独或组合使用:转录因子结合位点基序簇、受进化约束的非编码 DNA 和与 CRMs 相关的生化标记,如组蛋白修饰和蛋白质占据。预测的验证率表明,鉴定诊断性生化标记是最可靠的方法,通过分析预测的 CRMs 中的基序和保守模式,可以增强对其的理解。