Zhang M Q
Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA.
Genome Res. 1998 Mar;8(3):319-26. doi: 10.1101/gr.8.3.319.
Identification of the 5'-end of human genes requires identification of functional promoter elements. In silico identification of those elements is difficult because of the hierarchical and modular nature of promoter architecture. To address this problem, I propose a new stepwise strategy based on initial localization of a functional promoter into a 1- to 2-kb (extended promoter) region from within a large genomic DNA sequence of 100 kb or larger and further localization of a transcriptional start site (TSS) into a 50- to 100-bp (corepromoter) region. Using positional dependent 5-tuple measures, a quadratic discriminant analysis (QDA) method has been implemented in a new program-CorePromoter. Our experiments indicate that when given a 1- to 2-kb extended promoter, CorePromoter will correctly localize the TSS to a 100-bp interval approximately 60% of the time. [Figure 3 can be found in its entirety as an online supplement at http://www.genome.org.]
鉴定人类基因的5'端需要鉴定功能性启动子元件。由于启动子结构的层级性和模块化性质,通过计算机模拟鉴定这些元件具有一定难度。为解决这一问题,我提出了一种新的逐步策略,该策略基于首先将功能性启动子定位到100 kb或更大的大型基因组DNA序列中的1至2 kb(扩展启动子)区域,然后将转录起始位点(TSS)进一步定位到50至100 bp(核心启动子)区域。利用位置依赖的五元组度量,在一个新程序——CorePromoter中实现了二次判别分析(QDA)方法。我们的实验表明,当给定一个1至2 kb的扩展启动子时,CorePromoter大约60%的情况下能将TSS正确定位到100 bp的区间内。[图3可在http://www.genome.org的在线补充资料中完整查看。]