一种基于快速近似字符串匹配的基因序列数据库构建软件系统。

A software system for gene sequence database construction based on fast approximate string matching.

作者信息

Liu Zheng, Borneman James, Jiang Tao

出版信息

Int J Bioinform Res Appl. 2005;1(3):273-91. doi: 10.1504/IJBRA.2005.007906.

DOI:10.1504/IJBRA.2005.007906

Abstract

We propose a web-based software system for sequence acquisition and database construction. An example application of this system is to construct a ribosomal RNA gene (rDNA) sequence database to facilitate the study of microbial communities. A fast and accurate approximate string matching algorithm is implemented to fetch rDNA sequences sandwiched by two given primers from GenBank. A homology search algorithm based on Basic-Local-Alignment-Search-Tool (BLAST) is then used to extract rDNA sequences that do not contain the primers. This two step process leads to an rDNA sequence database for a specific taxonomic group. We consider the distance between the occurrences of the two given primers, mismatches and degeneracy when performing string matching. In the homology search, a chaining algorithm is combined with BLAST to obtain global alignments based on local alignments. This system can be used in many biological applications.

摘要

我们提出了一种基于网络的用于序列获取和数据库构建的软件系统。该系统的一个示例应用是构建核糖体RNA基因（rDNA）序列数据库，以促进对微生物群落的研究。实现了一种快速准确的近似字符串匹配算法，用于从GenBank中获取夹在两个给定引物之间的rDNA序列。然后使用基于基本局部比对搜索工具（BLAST）的同源性搜索算法来提取不含引物的rDNA序列。这个两步过程产生了针对特定分类群的rDNA序列数据库。在进行字符串匹配时，我们考虑两个给定引物出现位置之间的距离、错配和简并性。在同源性搜索中，将一种链接算法与BLAST相结合，以基于局部比对获得全局比对。该系统可用于许多生物学应用。