Tharakaraman Kannan, Bodenreider Olivier, Landsman David, Spouge John L, Mariño-Ramírez Leonardo
Computational Biology Branch, National Center for Biotechnology Information and National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, MSC 6075 Bethesda, MD 20894-6075, USA.
Nucleic Acids Res. 2008 May;36(8):2777-86. doi: 10.1093/nar/gkn137. Epub 2008 Mar 26.
A number of previous studies have predicted transcription factor binding sites (TFBSs) by exploiting the position of genomic landmarks like the transcriptional start site (TSS). The studies' methods are generally too computationally intensive for genome-scale investigation, so the full potential of 'positional regulomics' to discover TFBSs and determine their function remains unknown. Because databases often annotate the genomic landmarks in DNA sequences, the methodical exploitation of positional regulomics has become increasingly urgent. Accordingly, we examined a set of 7914 human putative promoter regions (PPRs) with a known TSS. Our methods identified 1226 eight-letter DNA words with significant positional preferences with respect to the TSS, of which only 608 of the 1226 words matched known TFBSs. Many groups of genes whose PPRs contained a common word displayed similar expression profiles and related biological functions, however. Most interestingly, our results included 78 words, each of which clustered significantly in two or three different positions relative to the TSS. Often, the gene groups corresponding to different positional clusters of the same word corresponded to diverse functions, e.g. activation or repression in different tissues. Thus, different clusters of the same word likely reflect the phenomenon of 'positional regulation', i.e. a word's regulatory function can vary with its position relative to a genomic landmark, a conclusion inaccessible to methods based purely on sequence. Further integrative analysis of words co-occurring in PPRs also yielded 24 different groups of genes, likely identifying cis-regulatory modules de novo. Whereas comparative genomics requires precise sequence alignments, positional regulomics exploits genomic landmarks to provide a 'poor man's alignment'. By exploiting the phenomenon of positional regulation, it uses position to differentiate the biological functions of subsets of TFBSs sharing a common sequence motif.
此前已有多项研究通过利用诸如转录起始位点(TSS)等基因组标记的位置来预测转录因子结合位点(TFBSs)。这些研究方法通常计算量过大,无法用于全基因组规模的研究,因此“位置调控组学”在发现TFBSs并确定其功能方面的全部潜力仍不为人知。由于数据库经常对DNA序列中的基因组标记进行注释,系统地利用位置调控组学变得愈发迫切。因此,我们研究了一组7914个已知TSS的人类假定启动子区域(PPRs)。我们的方法识别出1226个相对于TSS具有显著位置偏好的八字母DNA单词,其中1226个单词中只有608个与已知的TFBSs匹配。然而,许多其PPRs包含共同单词的基因组显示出相似的表达谱和相关的生物学功能。最有趣的是,我们的结果包括78个单词,每个单词在相对于TSS的两个或三个不同位置上显著聚集。通常,与同一单词的不同位置簇相对应的基因组对应于不同的功能,例如在不同组织中的激活或抑制。因此,同一单词的不同簇可能反映了“位置调控”现象,即一个单词的调控功能可能因其相对于基因组标记的位置而异,这一结论是纯粹基于序列的方法无法得出的。对PPRs中共同出现的单词进行进一步的综合分析还产生了24组不同的基因,可能是从头识别出顺式调控模块。比较基因组学需要精确的序列比对,而位置调控组学利用基因组标记提供一种“穷人的比对”。通过利用位置调控现象,它利用位置来区分共享共同序列基序的TFBSs子集的生物学功能。