Pesole G, Prunella N, Liuni S, Attimonelli M, Saccone C
Dipartimento di Biochimica e Biologia Molecolare, Università di Bari, Italy.
Nucleic Acids Res. 1992 Jun 11;20(11):2871-5. doi: 10.1093/nar/20.11.2871.
We present here a fast and sensitive method designed to isolate short nucleotide sequences which have non-random statistical properties and may thus be biologically active. It is based on a first order Markov analysis and allows us to detect statistically significant sequence motifs from six to ten nucleotides long which are significantly shared (or avoided) in the sequences under investigation. This method has been tested on a set of 521 sequences extracted from the Eukaryotic Promoter Database (2). Our results demonstrate the accuracy and the efficiency of the method in that the sequence motifs which are known to act as eukaryotic promoters, such as the TATA-box and the CAAT-box, were clearly identified. In addition we have found other statistically significant motifs, the biological roles of which are yet to be clarified.
我们在此展示一种快速且灵敏的方法,该方法旨在分离具有非随机统计特性且可能因此具有生物活性的短核苷酸序列。它基于一阶马尔可夫分析,使我们能够检测长度为6至10个核苷酸的具有统计学意义的序列基序,这些基序在所研究的序列中被显著共享(或避免)。此方法已在从真核生物启动子数据库(2)中提取的一组521个序列上进行了测试。我们的结果证明了该方法的准确性和效率,因为已知作为真核生物启动子的序列基序,如TATA框和CAAT框,被清晰地识别出来。此外,我们还发现了其他具有统计学意义的基序,其生物学作用尚待阐明。