Glenwinkel Lori, Wu Di, Minevich Gregory, Hobert Oliver
Department of Biochemistry and Molecular Biophysics, Howard Hughes Medical Institute, Columbia University Medical Center, New York, New York 10032.
Genetics. 2014 May;197(1):61-76. doi: 10.1534/genetics.113.160721. Epub 2014 Feb 20.
The identification of the regulatory targets of transcription factors is central to our understanding of how transcription factors fulfill their many key roles in development and homeostasis. DNA-binding sites have been uncovered for many transcription factors through a number of experimental approaches, but it has proven difficult to use this binding site information to reliably predict transcription factor target genes in genomic sequence space. Using the nematode Caenorhabditis elegans and other related nematode species as a starting point, we describe here a bioinformatic pipeline that identifies potential transcription factor target genes from genomic sequences. Among the key features of this pipeline is the use of sequence conservation of transcription-factor-binding sites in related species. Rather than using aligned genomic DNA sequences from the genomes of multiple species as a starting point, TargetOrtho scans related genome sequences independently for matches to user-provided transcription-factor-binding motifs, assigns motif matches to adjacent genes, and then determines whether orthologous genes in different species also contain motif matches. We validate TargetOrtho by identifying previously characterized targets of three different types of transcription factors in C. elegans, and we use TargetOrtho to identify novel target genes of the Collier/Olf/EBF transcription factor UNC-3 in C. elegans ventral nerve cord motor neurons. We have also implemented the use of TargetOrtho in Drosophila melanogaster using conservation among five species in the D. melanogaster species subgroup for target gene discovery.
转录因子调控靶点的鉴定对于我们理解转录因子如何在发育和体内平衡中发挥其众多关键作用至关重要。通过多种实验方法已经发现了许多转录因子的DNA结合位点,但事实证明,利用这些结合位点信息在基因组序列空间中可靠地预测转录因子靶基因是困难的。以线虫秀丽隐杆线虫和其他相关线虫物种为起点,我们在此描述一种生物信息学流程,该流程可从基因组序列中识别潜在的转录因子靶基因。该流程的关键特征之一是利用相关物种中转录因子结合位点的序列保守性。TargetOrtho不是以来自多个物种基因组的比对基因组DNA序列为起点,而是独立扫描相关基因组序列以寻找与用户提供的转录因子结合基序的匹配,将基序匹配分配给相邻基因,然后确定不同物种中的直系同源基因是否也包含基序匹配。我们通过鉴定秀丽隐杆线虫中三种不同类型转录因子先前已表征的靶点来验证TargetOrtho,并使用TargetOrtho来鉴定秀丽隐杆线虫腹侧神经索运动神经元中Collier/Olf/EBF转录因子UNC-3的新靶基因。我们还利用黑腹果蝇物种亚组中五个物种之间的保守性,在黑腹果蝇中实施了TargetOrtho用于靶基因发现。