一种用于生物序列的基础分析工具包。

A basic analysis toolkit for biological sequences.

作者信息

Giancarlo Raffaele, Siragusa Alessandro, Siragusa Enrico, Utro Filippo

机构信息

Dipartimento di Matematica Applicazioni, Università di Palermo, Italy.

出版信息

Algorithms Mol Biol. 2007 Sep 18;2:10. doi: 10.1186/1748-7188-2-10.

DOI:10.1186/1748-7188-2-10

PMID:17877802

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2147010/

Abstract

This paper presents a software library, nicknamed BATS, for some basic sequence analysis tasks. Namely, local alignments, via approximate string matching, and global alignments, via longest common subsequence and alignments with affine and concave gap cost functions. Moreover, it also supports filtering operations to select strings from a set and establish their statistical significance, via z-score computation. None of the algorithms is new, but although they are generally regarded as fundamental for sequence analysis, they have not been implemented in a single and consistent software package, as we do here. Therefore, our main contribution is to fill this gap between algorithmic theory and practice by providing an extensible and easy to use software library that includes algorithms for the mentioned string matching and alignment problems. The library consists of C/C++ library functions as well as Perl library functions. It can be interfaced with Bioperl and can also be used as a stand-alone system with a GUI. The software is available at http://www.math.unipa.it/~raffaele/BATS/ under the GNU GPL.

摘要

本文介绍了一个名为BATS的软件库，用于一些基本的序列分析任务。具体来说，通过近似字符串匹配进行局部比对，通过最长公共子序列以及使用仿射和凹形间隙代价函数进行全局比对。此外，它还支持过滤操作，通过计算z分数从一组字符串中选择字符串并确定其统计显著性。这些算法都不是新的，但尽管它们通常被视为序列分析的基础算法，却没有像我们在这里所做的那样，被实现为一个统一且一致的软件包。因此，我们的主要贡献在于填补算法理论与实践之间的这一空白，提供一个可扩展且易于使用的软件库，其中包含用于上述字符串匹配和比对问题的算法。该库由C/C++库函数以及Perl库函数组成。它可以与Bioperl接口，也可以作为一个带有图形用户界面的独立系统使用。该软件可在http://www.math.unipa.it/~raffaele/BATS/ 上以GNU GPL协议获取。