Department of Molecular Medicine, National Public Health Institute (KTL) and Genome-Scale Biology Program, Institute of Biomedicine and High Throughput Center, University of Helsinki, Biomedicum, Helsinki, Finland.
Genome Res. 2010 Jun;20(6):861-73. doi: 10.1101/gr.100552.109. Epub 2010 Apr 8.
The genetic code-the binding specificity of all transfer-RNAs--defines how protein primary structure is determined by DNA sequence. DNA also dictates when and where proteins are expressed, and this information is encoded in a pattern of specific sequence motifs that are recognized by transcription factors. However, the DNA-binding specificity is only known for a small fraction of the approximately 1400 human transcription factors (TFs). We describe here a high-throughput method for analyzing transcription factor binding specificity that is based on systematic evolution of ligands by exponential enrichment (SELEX) and massively parallel sequencing. The method is optimized for analysis of large numbers of TFs in parallel through the use of affinity-tagged proteins, barcoded selection oligonucleotides, and multiplexed sequencing. Data are analyzed by a new bioinformatic platform that uses the hundreds of thousands of sequencing reads obtained to control the quality of the experiments and to generate binding motifs for the TFs. The described technology allows higher throughput and identification of much longer binding profiles than current microarray-based methods. In addition, as our method is based on proteins expressed in mammalian cells, it can also be used to characterize DNA-binding preferences of full-length proteins or proteins requiring post-translational modifications. We validate the method by determining binding specificities of 14 different classes of TFs and by confirming the specificities for NFATC1 and RFX3 using ChIP-seq. Our results reveal unexpected dimeric modes of binding for several factors that were thought to preferentially bind DNA as monomers.
遗传密码——所有转移 RNA 的结合特异性——定义了蛋白质一级结构如何由 DNA 序列决定。DNA 还决定了蛋白质何时何地表达,而这些信息编码在特定序列基序的模式中,这些基序被转录因子识别。然而,只有大约 1400 个人类转录因子(TF)中的一小部分的 DNA 结合特异性是已知的。我们在这里描述了一种基于配体系统进化指数富集(SELEX)和大规模平行测序的转录因子结合特异性的高通量分析方法。该方法通过使用亲和标记蛋白、条形码选择寡核苷酸和多重测序,针对大量 TF 进行平行分析进行了优化。通过使用数百个测序读取来控制实验质量并为 TF 生成结合基序的新生物信息学平台来分析数据。与当前基于微阵列的方法相比,该描述的技术允许更高的通量和识别更长的结合谱。此外,由于我们的方法基于哺乳动物细胞中表达的蛋白质,因此它也可用于表征全长蛋白质或需要翻译后修饰的蛋白质的 DNA 结合偏好性。我们通过确定 14 种不同类别的 TF 的结合特异性,并通过使用 ChIP-seq 确认 NFATC1 和 RFX3 的特异性来验证该方法。我们的结果揭示了几个被认为优先作为单体结合 DNA 的因子的意想不到的二聚结合模式。