Vega Vinsensius B, Lin Chin-Yo, Lai Koon Siew, Kong Say Li, Xie Min, Su Xiaodi, Teh Huey Fang, Thomsen Jane S, Yeo Ai Li, Sung Wing Kin, Bourque Guillaume, Liu Edison T
Estrogen Receptor Biology Program, Genome Institute of Singapore, 60 Biopolis Street, Republic of Singapore 138672.
Genome Biol. 2006;7(9):R82. doi: 10.1186/gb-2006-7-9-r82.
Transcription factor binding sites (TFBS) impart specificity to cellular transcriptional responses and have largely been defined by consensus motifs derived from a handful of validated sites. The low specificity of the computational predictions of TFBSs has been attributed to ubiquity of the motifs and the relaxed sequence requirements for binding. We posited that the inadequacy is due to limited input of empirically verified sites, and demonstrated a multiplatform approach to constructing a robust model.
Using the TFBS for the estrogen receptor (ER)alpha (estrogen response element [ERE]) as a model system, we extracted EREs from multiple molecular and genomic platforms whose binding to ERalpha has been experimentally confirmed or rejected. In silico analyses revealed significant sequence information flanking the standard binding consensus, discriminating ERE-like sequences that bind ERalpha from those that are nonbinders. We extended the ERE consensus by three bases, bearing a terminal G at the third position 3' and an initiator C at the third position 5', which were further validated using surface plasmon resonance spectroscopy. Our functional human ERE prediction algorithm (h-ERE) outperformed existing predictive algorithms and produced fewer than 5% false negatives upon experimental validation.
Building upon a larger experimentally validated ERE set, the h-ERE algorithm is able to demarcate better the universe of ERE-like sequences that are potential ER binders. Only 14% of the predicted optimal binding sites were utilized under the experimental conditions employed, pointing to other selective criteria not related to EREs. Other factors, in addition to primary nucleotide sequence, will ultimately determine binding site selection.
转录因子结合位点(TFBS)赋予细胞转录反应特异性,并且很大程度上已由从少数经过验证的位点推导出来的共有基序所定义。TFBS计算预测的低特异性归因于基序的普遍性以及结合时宽松的序列要求。我们认为这种不足是由于经实验验证的位点输入有限所致,并展示了一种构建稳健模型的多平台方法。
以雌激素受体(ER)α的TFBS(雌激素反应元件[ERE])作为模型系统,我们从多个分子和基因组平台提取ERE,这些平台与ERα的结合已通过实验得到证实或否定。计算机分析揭示了标准结合共有序列侧翼的重要序列信息,可区分与ERα结合的类ERE序列和不结合的序列。我们将ERE共有序列延伸了三个碱基,在3'端第三位带有一个末端G,在5'端第三位带有一个起始C,并用表面等离子体共振光谱进一步验证。我们的功能性人类ERE预测算法(h - ERE)优于现有的预测算法,经实验验证产生的假阴性少于5%。
基于更大的经实验验证的ERE集合,h - ERE算法能够更好地划分潜在ER结合物的类ERE序列范围。在所采用的实验条件下,仅14%的预测最佳结合位点被利用,这表明存在与ERE无关的其他选择标准。除了一级核苷酸序列外,其他因素最终将决定结合位点的选择。