Tompa M
Department of Computer Science and Engineering, University of Washington, Seattle 98195-2350, USA.
Proc Int Conf Intell Syst Mol Biol. 1999:262-71.
This is an investigation of methods for finding short motifs that only occur in a fraction of the input sequences. Unlike local search techniques that may not reach a global optimum, the method proposed here is guaranteed to produce the motifs with greatest z-scores. This method is illustrated for the Ribosome Binding Site Problem, which is to identify the short mRNA 5' untranslated sequence that is recognized by the ribosome during initiation of protein synthesis. Experiments were performed to solve this problem for each of fourteen sequenced prokaryotes, by applying the method to the full complement of genes from each. One of the interesting results of this experimentation is evidence that the recognized sequence of the thermophilic archaea A. fulgidus, M. jannaschii, M. thermoautotrophicum, and P. horikoshii may be somewhat different than the well known Shine-Dalgarno sequence.
这是一项关于寻找仅在部分输入序列中出现的短基序的方法的研究。与可能无法达到全局最优的局部搜索技术不同,这里提出的方法可保证产生具有最大z分数的基序。该方法通过核糖体结合位点问题进行说明,该问题是识别蛋白质合成起始过程中被核糖体识别的短mRNA 5'非翻译序列。通过将该方法应用于14种已测序原核生物中每种生物的全部基因互补序列,进行了实验以解决这个问题。该实验的一个有趣结果是,有证据表明嗜热古菌嗜热栖热菌、詹氏甲烷球菌、嗜热自养甲烷杆菌和掘越栖热菌的识别序列可能与众所周知的夏因-达尔加诺序列有所不同。