The J. Craig Venter Institute, Rockville, MD 20850, USA.
Bioinformatics. 2011 Jul 1;27(13):1869-70. doi: 10.1093/bioinformatics/btr285. Epub 2011 May 6.
The large number of genomes that will be sequenced will need to be annotated with genes and other functional features. Aligning gene sequences from a related species to the target genome is an economical and highly reliable method to identify genes; unfortunately, existing tools have been lacking in sensitivity and speed. A program we reported, sim4cc, was shown to be highly accurate but is limited to comparing one cDNA with one genomic sequence. We present here an optimization of the tool, implemented in the packages sim4db and leaff. The new tool performs batch alignments of cDNA and genomic sequences in a fraction of the time required by its predecessor, and thus is very well suited for genome-wide analyses.
Sim4db and leaff are written in C, C++ and Perl for Linux and other Unix platforms. Source code is distributed free of charge from http://sourceforge.net/projects/kmer/.
需要对大量将要测序的基因组进行基因和其他功能特征注释。将相关物种的基因序列与目标基因组进行比对是识别基因的经济高效且高度可靠的方法;遗憾的是,现有的工具在灵敏度和速度方面有所欠缺。我们报告的一个程序 sim4cc 被证明具有高度准确性,但仅限于比较一个 cDNA 与一个基因组序列。我们在此介绍该工具的优化版本,它实现于 sim4db 和 leaff 软件包中。新工具的批处理 cDNA 和基因组序列比对速度比其前身快得多,因此非常适合全基因组分析。
Sim4db 和 leaff 是用 C、C++和 Perl 编写的,可用于 Linux 和其他 Unix 平台。源代码可从 http://sourceforge.net/projects/kmer/ 免费下载。