Boeva Valentina
Centre de Recherche, Institut CurieParis, France; INSERM, U900Paris, France; Mines ParisTechFontainebleau, France; PSL Research UniversityParis, France; Department of Development, Reproduction and Cancer, Institut CochinParis, France; INSERM, U1016Paris, France; Centre National de la Recherche Scientifique UMR 8104Paris, France; Université Paris Descartes UMR-S1016Paris, France.
Front Genet. 2016 Feb 23;7:24. doi: 10.3389/fgene.2016.00024. eCollection 2016.
Eukaryotic genomes contain a variety of structured patterns: repetitive elements, binding sites of DNA and RNA associated proteins, splice sites, and so on. Often, these structured patterns can be formalized as motifs and described using a proper mathematical model such as position weight matrix and IUPAC consensus. Two key tasks are typically carried out for motifs in the context of the analysis of genomic sequences. These are: identification in a set of DNA regions of over-represented motifs from a particular motif database, and de novo discovery of over-represented motifs. Here we describe existing methodology to perform these two tasks for motifs characterizing transcription factor binding. When applied to the output of ChIP-seq and ChIP-exo experiments, or to promoter regions of co-modulated genes, motif analysis techniques allow for the prediction of transcription factor binding events and enable identification of transcriptional regulators and co-regulators. The usefulness of motif analysis is further exemplified in this review by how motif discovery improves peak calling in ChIP-seq and ChIP-exo experiments and, when coupled with information on gene expression, allows insights into physical mechanisms of transcriptional modulation.
重复元件、与DNA和RNA相关蛋白的结合位点、剪接位点等等。通常,这些结构化模式可以形式化为基序,并使用适当的数学模型(如位置权重矩阵和国际纯粹与应用化学联合会(IUPAC)一致序列)进行描述。在基因组序列分析的背景下,通常针对基序执行两项关键任务。它们是:在一组DNA区域中识别来自特定基序数据库的过度代表的基序,以及从头发现过度代表的基序。在这里,我们描述了针对表征转录因子结合的基序执行这两项任务的现有方法。当应用于染色质免疫沉淀测序(ChIP-seq)和染色质免疫沉淀外显子测序(ChIP-exo)实验的输出,或共调控基因的启动子区域时,基序分析技术可以预测转录因子结合事件,并能够识别转录调节因子和共调节因子。在本综述中,通过基序发现如何改善ChIP-seq和ChIP-exo实验中的峰检测,以及当与基因表达信息结合时如何深入了解转录调控的物理机制,进一步例证了基序分析的有用性。