Blanchette M, Schwikowski B, Tompa M
Department of Computer Science and Engineering, University of Washington, Seattle 98195-2350, USA.
Proc Int Conf Intell Syst Mol Biol. 2000;8:37-45.
The identification of sequence motifs is a fundamental method for suggesting good candidates for biologically functional regions such as promoters, splice sites, binding sites, etc. We investigate the following approach to identifying motifs: given a collection of orthologous sequences from multiple species related by a known phylogenetic tree, search for motifs that are well conserved (according to a parsimony measure) in the species. We present an exact algorithm for solving this problem. We then discuss experimental results on finding promoters of the rbcS gene for a family of 10 plants, on finding promoters of the adh gene for 12 Drosophila species, and on finding promoters of several chloroplast encoded genes.
序列基序的识别是一种基本方法,用于为启动子、剪接位点、结合位点等生物功能区域推荐良好的候选序列。我们研究了以下识别基序的方法:给定一组通过已知系统发育树相关的多个物种的直系同源序列,搜索在这些物种中保守性良好(根据简约性度量)的基序。我们提出了一种精确算法来解决这个问题。然后,我们讨论了关于为10种植物家族寻找rbcS基因启动子、为12种果蝇物种寻找adh基因启动子以及为几个叶绿体编码基因寻找启动子的实验结果。