Yan Thomas, Yoo Danny, Berardini Tanya Z, Mueller Lukas A, Weems Dan C, Weng Shuai, Cherry J Michael, Rhee Seung Y
Department of Plant Biology, Carnegie Institution of Washington, 260 Panama Street, Stanford, CA 94305, USA.
Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W262-6. doi: 10.1093/nar/gki368.
Here, we present PatMatch, an efficient, web-based pattern-matching program that enables searches for short nucleotide or peptide sequences such as cis-elements in nucleotide sequences or small domains and motifs in protein sequences. The program can be used to find matches to a user-specified sequence pattern that can be described using ambiguous sequence codes and a powerful and flexible pattern syntax based on regular expressions. A recent upgrade has improved performance and now supports both mismatches and wildcards in a single pattern. This enhancement has been achieved by replacing the previous searching algorithm, scan_for_matches [D'Souza et al. (1997), Trends in Genetics, 13, 497-498], with nondeterministic-reverse grep (NR-grep), a general pattern matching tool that allows for approximate string matching [Navarro (2001), Software Practice and Experience, 31, 1265-1312]. We have tailored NR-grep to be used for DNA and protein searches with PatMatch. The stand-alone version of the software can be adapted for use with any sequence dataset and is available for download at The Arabidopsis Information Resource (TAIR) at ftp://ftp.arabidopsis.org/home/tair/Software/Patmatch/. The PatMatch server is available on the web at http://www.arabidopsis.org/cgi-bin/patmatch/nph-patmatch.pl for searching Arabidopsis thaliana sequences.
在此,我们展示了PatMatch,这是一个基于网络的高效模式匹配程序,可用于搜索短核苷酸或肽序列,如核苷酸序列中的顺式元件或蛋白质序列中的小结构域和基序。该程序可用于查找与用户指定的序列模式的匹配项,该模式可使用模糊序列代码以及基于正则表达式的强大且灵活的模式语法来描述。最近的一次升级提高了性能,现在在单个模式中同时支持错配和通配符。这一改进是通过用非确定性反向grep(NR-grep)替换先前的搜索算法scan_for_matches [D'Souza等人(1997年),《遗传学趋势》,13,497 - 498]实现的,NR-grep是一种允许近似字符串匹配的通用模式匹配工具[Navarro(2001年),《软件实践与经验》,31,1265 - 1312]。我们对NR-grep进行了定制,使其可用于PatMatch的DNA和蛋白质搜索。该软件的独立版本可适用于任何序列数据集,可从拟南芥信息资源库(TAIR)通过ftp://ftp.arabidopsis.org/home/tair/Software/Patmatch/下载。PatMatch服务器可通过网络访问http://www.arabidopsis.org/cgi-bin/patmatch/nph-patmatch.pl,用于搜索拟南芥序列。