Proteomics Research Unit, Centre of Basic Research II, Biomedical Research Foundation, Academy of Athens, Athens, Greece.
Cancer Genomics Proteomics. 2009 Nov-Dec;6(6):337-55.
Promoter regions of the human genome play a key role in our understanding of the regulatory mechanisms related to the physiological and disease states. The aim of this study was to investigate the sequence positional properties of experimentally verified human promoters. Consequently, we determined short sequence elements ranging from 4 to 9mers presenting position dominance close to, or away from the transcription start site (TSS). For this purpose rigid statistical criteria were used and whether position dominance was in any way related to transcription control was determined. To achieve this goal we designed and implemented a dedicated filtering method to massively detect position-dominant sequence elements embedded in the promoter set. Additionally, via a high throughput procedure, we gathered data on the majority of the publicly available transcription factor-binding sites (TFBSs) and matched them to our findings, aiming to accomplish a large-scale correlation between position-dominant sequence elements and TFBSs. In this analysis, we present unique compositional and conservational perturbations at the TSS and the core promoter region. Using our filtering method, 7,088 short sequences ranging from 4 to 9mers were found to present strong positional dominance close to or away from the TSS, while the aforementioned short sequences were matched to a large number of known TFBSs. Moreover, using probability theory, evidence is presented showing that TFBSs tend to present strong positional preferences. In addition, we demonstrate that the actual TFBS copy number is related to the transcription regulatory process. On the basis of the last argument, it is suggested that all the detected short sequences which did not match any known TFBS, have a high potential for being novel transcription control elements. Furthermore, using a well-described ;high potential cancer biomarker resource', we attempted to identify position dominant sequence elements associated with cancer, as derived by their presence in the respective promoters of cancer related proteins.
人类基因组的启动子区域在我们理解与生理和疾病状态相关的调控机制方面发挥着关键作用。本研究的目的是研究实验验证的人类启动子的序列位置特性。因此,我们确定了短序列元件,范围从 4 到 9 个核苷酸,这些元件在转录起始位点(TSS)附近或远离 TSS 呈现位置优势。为此,我们使用了严格的统计标准,并确定位置优势是否与转录控制有关。为了实现这一目标,我们设计并实施了一种专门的过滤方法,以大规模检测嵌入启动子集中的位置优势序列元件。此外,通过高通量程序,我们收集了大多数公开可用的转录因子结合位点(TFBS)的数据,并将其与我们的发现进行匹配,旨在在位置优势序列元件和 TFBS 之间实现大规模相关性。在这项分析中,我们在 TSS 和核心启动子区域呈现独特的组成和保守性扰动。使用我们的过滤方法,发现了 7088 个从 4 到 9 个核苷酸的短序列,这些序列在 TSS 附近或远离 TSS 呈现出强烈的位置优势,而上述短序列与大量已知的 TFBS 相匹配。此外,使用概率论,我们提供了证据表明 TFBS 倾向于呈现强烈的位置偏好。此外,我们还证明了实际的 TFBS 拷贝数与转录调控过程有关。基于最后一个论点,建议所有未匹配任何已知 TFBS 的检测到的短序列都具有成为新的转录调控元件的高潜力。此外,我们使用一个描述良好的“高潜力癌症生物标志物资源”,试图识别与癌症相关的位置优势序列元件,这些序列元件是由它们在癌症相关蛋白的相应启动子中的存在所衍生的。