Betel Doron, Hogue Christopher W V
Department of Biochemistry, University of Toronto, Toronto, Ontario, M5S 1A8, Canada.
BMC Bioinformatics. 2002 Jul 31;3:20. doi: 10.1186/1471-2105-3-20.
Biologists are often interested in performing a simple database search to identify proteins or genes that contain a well-defined sequence pattern. Many databases do not provide straightforward or readily available query tools to perform simple searches, such as identifying transcription binding sites, protein motifs, or repetitive DNA sequences. However, in many cases simple pattern-matching searches can reveal a wealth of information. We present in this paper a regular expression pattern-matching tool that was used to identify short repetitive DNA sequences in human coding regions for the purpose of identifying potential mutation sites in mismatch repair deficient cells.
Kangaroo is a web-based regular expression pattern-matching program that can search for patterns in DNA, protein, or coding region sequences in ten different organisms. The program is implemented to facilitate a wide range of queries with no restriction on the length or complexity of the query expression. The program is accessible on the web at http://bioinfo.mshri.on.ca/kangaroo/ and the source code is freely distributed at http://sourceforge.net/projects/slritools/.
A low-level simple pattern-matching application can prove to be a useful tool in many research settings. For example, Kangaroo was used to identify potential genetic targets in a human colorectal cancer variant that is characterized by a high frequency of mutations in coding regions containing mononucleotide repeats.
生物学家常常希望通过简单的数据库搜索来识别包含明确序列模式的蛋白质或基因。许多数据库并未提供直接或易于使用的查询工具来进行简单搜索,比如识别转录结合位点、蛋白质基序或重复性DNA序列。然而,在很多情况下,简单的模式匹配搜索能够揭示大量信息。我们在本文中展示了一种正则表达式模式匹配工具,该工具用于识别人类编码区域中的短重复性DNA序列,目的是在错配修复缺陷细胞中识别潜在的突变位点。
Kangaroo是一个基于网络的正则表达式模式匹配程序,它可以在十种不同生物体的DNA、蛋白质或编码区域序列中搜索模式。该程序的实现便于进行广泛的查询,对查询表达式的长度或复杂度没有限制。该程序可通过网络访问http://bioinfo.mshri.on.ca/kangaroo/,其源代码可在http://sourceforge.net/projects/slritools/上免费获取。
一个低级别的简单模式匹配应用程序在许多研究场景中可能会被证明是一个有用的工具。例如,Kangaroo被用于识别一种人类结肠直肠癌变体中的潜在遗传靶点,该变体的特征是在包含单核苷酸重复序列的编码区域中具有高频率的突变。