Harr R, Häggström M, Gustafsson P
Nucleic Acids Res. 1983 May 11;11(9):2943-57. doi: 10.1093/nar/11.9.2943.
A new type of search algorithm to find biological information inherited in nucleic acid sequences was developed. The algorithm is of pattern match type and is based on the fact that genetic information often is a function of a predictable statistical occurrence of the four bases within parts of the sequence. The search algorithm compares the known statistical pattern of bases in e.g. a promoter, with an unknown sequence and calculates the statistical significance of the match at all positions in the unknown sequence. The program was tested on 54 published prokaryotic promoters. 44 or 49 could be found with 1 or 4 false answers, respectively. The program was also used on plasmid pBR322. All promoters functioning in an in vitro transcription system were found (tet, anti-tet, p4, bla and ori) except the so called p5 promoter. A search for donor and acceptor sites was performed in a human HLA genomic sequence that contains six introns. Five of the possible six donor and acceptor sites were found.
开发了一种新型搜索算法,用于查找核酸序列中遗传的生物信息。该算法属于模式匹配类型,其基于这样一个事实:遗传信息通常是序列部分内四个碱基可预测统计出现情况的函数。搜索算法将例如启动子中已知的碱基统计模式与未知序列进行比较,并计算未知序列中所有位置匹配的统计显著性。该程序在54个已发表的原核启动子上进行了测试。分别以1个或4个错误答案找到了44个或49个启动子。该程序还用于质粒pBR322。除了所谓的p5启动子外,发现了所有在体外转录系统中起作用的启动子(tet、抗tet、p4、bla和ori)。在包含六个内含子的人类HLA基因组序列中进行了供体和受体位点的搜索。找到了六个可能的供体和受体位点中的五个。