Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA.
Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA.
Nat Commun. 2023 Apr 22;14(1):2333. doi: 10.1038/s41467-023-37960-5.
The gene regulatory code and grammar remain largely unknown, precluding our ability to link phenotype to genotype in regulatory sequences. Here, using a massively parallel reporter assay (MPRA) of 209,440 sequences, we examine all possible pair and triplet combinations, permutations and orientations of eighteen liver-associated transcription factor binding sites (TFBS). We find that TFBS orientation and order have a major effect on gene regulatory activity. Corroborating these results with genomic analyses, we find clear human promoter TFBS orientation biases and similar TFBS orientation and order transcriptional effects in an MPRA that tested 164,307 liver candidate regulatory elements. Additionally, by adding TFBS orientation to a model that predicts expression from sequence we improve performance by 7.7%. Collectively, our results show that TFBS orientation and order have a significant effect on gene regulatory activity and need to be considered when analyzing the functional effect of variants on the activity of these sequences.
基因调控代码和语法在很大程度上仍然未知,这使得我们无法将表型与调控序列中的基因型联系起来。在这里,我们使用 209440 个序列的大规模平行报告基因检测(MPRA),研究了十八个与肝脏相关的转录因子结合位点(TFBS)的所有可能的二聚体和三聚体组合、排列和定向。我们发现 TFBS 的定向和顺序对基因调控活性有重大影响。通过与基因组分析的结果相印证,我们发现了人类启动子 TFBS 定向偏倚以及在测试了 164307 个肝脏候选调控元件的 MPRA 中类似的 TFBS 定向和顺序转录效应。此外,通过向一个从序列预测表达的模型中添加 TFBS 定向,我们将性能提高了 7.7%。总的来说,我们的结果表明,TFBS 的定向和顺序对基因调控活性有显著影响,在分析这些序列中变体对活性的功能影响时需要考虑这一点。