Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8900 Rockville Pike, Bethesda, MD 20894, USA.
Gene. 2012 Sep 10;506(1):125-34. doi: 10.1016/j.gene.2012.06.005. Epub 2012 Jun 10.
Understanding gene regulation is a major objective in molecular biology research. Frequently, transcription is driven by transcription factors (TFs) that bind to specific DNA sequences. These motifs are usually short and degenerate, rendering the likelihood of multiple copies occurring throughout the genome due to random chance as high. Despite this, TFs only bind to a small subset of sites, thus prompting our investigation into the differences between motifs that are bound by TFs and those that remain unbound. Here we constructed vectors representing various chromatin- and sequence-based features for a published set of bound and unbound motifs representing nine TFs in the budding yeast Saccharomyces cerevisiae. Using a machine learning approach, we identified a set of features that can be used to discriminate between bound and unbound motifs. We also discovered that some TFs bind most or all of their strong motifs in intergenic regions. Our data demonstrate that local sequence context can be strikingly different around motifs that are bound compared to motifs that are unbound. We concluded that there are multiple combinations of genomic features that characterize bound or unbound motifs.
理解基因调控是分子生物学研究的主要目标。转录通常由转录因子 (TFs) 驱动,这些因子与特定的 DNA 序列结合。这些基序通常较短且简并,因此由于随机机会,它们在整个基因组中多次出现的可能性很高。尽管如此,TFs 仅与一小部分位点结合,这促使我们研究 TF 结合的基序与未结合的基序之间的差异。在这里,我们构建了代表已发表的一组结合和未结合基序的载体,这些基序代表酿酒酵母中的九个 TF。使用机器学习方法,我们确定了一组可用于区分结合和未结合基序的特征。我们还发现,一些 TF 结合了它们在基因间区域的大多数或所有强基序。我们的数据表明,与未结合的基序相比,结合的基序周围的局部序列上下文可能会有很大的不同。我们得出的结论是,有多种组合的基因组特征可以描述结合或未结合的基序。