Krivan W, Wasserman W W
Bioinformatics Unit, Center for Genomics and Bioinformatics, Karolinska Institutet, 17177 Stockholm, Sweden.
Genome Res. 2001 Sep;11(9):1559-66. doi: 10.1101/gr.180601.
The identification and interpretation of the regulatory signals within the human genome remain among the greatest goals and most difficult challenges in genome analysis. The ability to predict the temporal and spatial control of transcription is likely to require a combination of methods to address the contribution of sequence-specific signals, protein-protein interactions and chromatin structure. We present here a new procedure to identify clusters of transcription factor binding sites characteristic of sequence modules experimentally verified to direct transcription selectively to liver cells. This algorithm is sufficiently specific to identify known regulatory sequences in genes selectively expressed in liver, promising acceleration of experimental promoter analysis. In combination with phylogenetic footprinting, this improvement in the specificity of predictions is sufficient to motivate a scan of the human genome. Potential regulatory modules were identified in orthologous human and rodent genomic sequences containing both known and uncharacterized genes.
人类基因组中调控信号的识别与解读仍是基因组分析中最重大的目标和最艰巨的挑战之一。预测转录的时空控制能力可能需要多种方法相结合,以解决序列特异性信号、蛋白质-蛋白质相互作用和染色质结构的作用。我们在此提出一种新方法,用于识别转录因子结合位点簇,这些位点簇是经实验验证可将转录选择性导向肝细胞的序列模块所特有的。该算法具有足够的特异性,能够识别在肝脏中选择性表达的基因中的已知调控序列,有望加速实验性启动子分析。结合系统发育足迹法,预测特异性的这种改进足以推动对人类基因组的扫描。在包含已知和未表征基因的直系同源人类和啮齿动物基因组序列中识别出了潜在的调控模块。