Attaluri Pavan Kumar, Christman Mary C, Chen Zhengxin, Lu Guoqing
Bioinformation. 2011 Feb 7;5(9):400-1. doi: 10.6026/97320630005400.
Most bioinformatics tools require specialized input formats for sequence comparison and analysis. This is particularly true for molecular phylogeny programs, which accept only certain formats. In addition, it is often necessary to eliminate highly similar sequences among the input, especially when the dataset is large. Moreover, most programs have restrictions upon the sequence name. Here we introduce SeqMaT, a Sequence Manipulation Tool. It has the following functions: data format conversion,sequence name coding and decoding,redundant and highly similar sequence removal, anddata mining utilities. SeqMaT was developed using Java with two versions, web-based and standalone. A standalone program is convenient to manipulate a large number of sequences, while the web version will guarantee wide availability of the tool for researchers and practitioners throughout the Internet.
The database is available for free at http://glee.ist.unomaha.edu/seqmat.
大多数生物信息学工具在进行序列比较和分析时都需要特定的输入格式。分子系统发育程序尤其如此,它们只接受特定的格式。此外,通常有必要去除输入序列中高度相似的序列,尤其是在数据集很大的时候。而且,大多数程序对序列名称都有限制。在此我们介绍SeqMaT,一种序列操作工具。它具有以下功能:数据格式转换、序列名称编码和解码、去除冗余和高度相似的序列以及数据挖掘实用程序。SeqMaT是用Java开发的,有两个版本,基于网络的版本和独立版本。独立程序便于处理大量序列,而网络版本则能确保该工具通过互联网供研究人员和从业者广泛使用。