• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于生物序列的基础分析工具包。

A basic analysis toolkit for biological sequences.

作者信息

Giancarlo Raffaele, Siragusa Alessandro, Siragusa Enrico, Utro Filippo

机构信息

Dipartimento di Matematica Applicazioni, Università di Palermo, Italy.

出版信息

Algorithms Mol Biol. 2007 Sep 18;2:10. doi: 10.1186/1748-7188-2-10.

DOI:10.1186/1748-7188-2-10
PMID:17877802
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2147010/
Abstract

This paper presents a software library, nicknamed BATS, for some basic sequence analysis tasks. Namely, local alignments, via approximate string matching, and global alignments, via longest common subsequence and alignments with affine and concave gap cost functions. Moreover, it also supports filtering operations to select strings from a set and establish their statistical significance, via z-score computation. None of the algorithms is new, but although they are generally regarded as fundamental for sequence analysis, they have not been implemented in a single and consistent software package, as we do here. Therefore, our main contribution is to fill this gap between algorithmic theory and practice by providing an extensible and easy to use software library that includes algorithms for the mentioned string matching and alignment problems. The library consists of C/C++ library functions as well as Perl library functions. It can be interfaced with Bioperl and can also be used as a stand-alone system with a GUI. The software is available at http://www.math.unipa.it/~raffaele/BATS/ under the GNU GPL.

摘要

本文介绍了一个名为BATS的软件库,用于一些基本的序列分析任务。具体来说,通过近似字符串匹配进行局部比对,通过最长公共子序列以及使用仿射和凹形间隙代价函数进行全局比对。此外,它还支持过滤操作,通过计算z分数从一组字符串中选择字符串并确定其统计显著性。这些算法都不是新的,但尽管它们通常被视为序列分析的基础算法,却没有像我们在这里所做的那样,被实现为一个统一且一致的软件包。因此,我们的主要贡献在于填补算法理论与实践之间的这一空白,提供一个可扩展且易于使用的软件库,其中包含用于上述字符串匹配和比对问题的算法。该库由C/C++库函数以及Perl库函数组成。它可以与Bioperl接口,也可以作为一个带有图形用户界面的独立系统使用。该软件可在http://www.math.unipa.it/~raffaele/BATS/ 上以GNU GPL协议获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41fa/2147010/3e5e0ed0f004/1748-7188-2-10-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41fa/2147010/333adc3b1782/1748-7188-2-10-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41fa/2147010/6a67af615968/1748-7188-2-10-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41fa/2147010/3e5e0ed0f004/1748-7188-2-10-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41fa/2147010/333adc3b1782/1748-7188-2-10-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41fa/2147010/6a67af615968/1748-7188-2-10-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41fa/2147010/3e5e0ed0f004/1748-7188-2-10-3.jpg

相似文献

1
A basic analysis toolkit for biological sequences.一种用于生物序列的基础分析工具包。
Algorithms Mol Biol. 2007 Sep 18;2:10. doi: 10.1186/1748-7188-2-10.
2
libFLASM: a software library for fixed-length approximate string matching.libFLASM:一个用于固定长度近似字符串匹配的软件库。
BMC Bioinformatics. 2016 Nov 10;17(1):454. doi: 10.1186/s12859-016-1320-2.
3
The Bioperl toolkit: Perl modules for the life sciences.生物Perl工具包:用于生命科学的Perl模块。
Genome Res. 2002 Oct;12(10):1611-8. doi: 10.1101/gr.361602.
4
Reconfigurable systems for sequence alignment and for general dynamic programming.用于序列比对和通用动态规划的可重构系统。
Genet Mol Res. 2005 Sep 30;4(3):543-52.
5
A software system for gene sequence database construction based on fast approximate string matching.一种基于快速近似字符串匹配的基因序列数据库构建软件系统。
Int J Bioinform Res Appl. 2005;1(3):273-91. doi: 10.1504/IJBRA.2005.007906.
6
Combining many multiple alignments in one improved alignment.将多个多序列比对合并为一个改进的比对。
Bioinformatics. 1999 Feb;15(2):122-30. doi: 10.1093/bioinformatics/15.2.122.
7
Fast algorithms for approximate circular string matching.近似循环字符串匹配的快速算法。
Algorithms Mol Biol. 2014 Mar 22;9(1):9. doi: 10.1186/1748-7188-9-9.
8
Bi-alignments with affine gaps costs.带仿射空位罚分的双序列比对
Algorithms Mol Biol. 2022 May 16;17(1):10. doi: 10.1186/s13015-022-00219-7.
9
Calign: aligning sequences with restricted affine gap penalties.Calign:使用受限仿射间隙罚分对序列进行比对。
Bioinformatics. 1999 Apr;15(4):298-304. doi: 10.1093/bioinformatics/15.4.298.
10
ProClust: improved clustering of protein sequences with an extended graph-based approach.ProClust:基于扩展的图形方法改进蛋白质序列聚类
Bioinformatics. 2002;18 Suppl 2:S182-91. doi: 10.1093/bioinformatics/18.suppl_2.s182.

引用本文的文献

1
A software pipeline for processing and identification of fungal ITS sequences.一种用于处理和鉴定真菌ITS序列的软件流程。
Source Code Biol Med. 2009 Jan 15;4:1. doi: 10.1186/1751-0473-4-1.

本文引用的文献

1
Optimal sequence alignments.最佳序列比对。
Proc Natl Acad Sci U S A. 1983 Mar;80(5):1382-6. doi: 10.1073/pnas.80.5.1382.
2
YMF: A program for discovery of novel transcription factor binding sites by statistical overrepresentation.YMF:一个通过统计过度代表性来发现新型转录因子结合位点的程序。
Nucleic Acids Res. 2003 Jul 1;31(13):3586-8. doi: 10.1093/nar/gkg618.
3
A statistical method for finding transcription factor binding sites.一种寻找转录因子结合位点的统计方法。
Proc Int Conf Intell Syst Mol Biol. 2000;8:344-54.
4
Over- and underrepresentation of short DNA words in herpesvirus genomes.疱疹病毒基因组中短DNA序列的过度和不足表征
J Comput Biol. 1996 Fall;3(3):345-60. doi: 10.1089/cmb.1996.3.345.
5
Performance evaluation of amino acid substitution matrices.氨基酸替换矩阵的性能评估
Proteins. 1993 Sep;17(1):49-61. doi: 10.1002/prot.340170108.
6
An improved algorithm for matching biological sequences.一种用于匹配生物序列的改进算法。
J Mol Biol. 1982 Dec 15;162(3):705-8. doi: 10.1016/0022-2836(82)90398-9.
7
Efficient sequence alignment algorithms.高效的序列比对算法。
J Theor Biol. 1984 Jun 7;108(3):333-7. doi: 10.1016/s0022-5193(84)80037-5.
8
Sequence comparison with concave weighting functions.使用凹加权函数进行序列比较。
Bull Math Biol. 1988;50(2):97-120. doi: 10.1007/BF02459948.
9
Basic local alignment search tool.基本局部比对搜索工具
J Mol Biol. 1990 Oct 5;215(3):403-10. doi: 10.1016/S0022-2836(05)80360-2.
10
Amino acid substitution matrices from protein blocks.来自蛋白质模块的氨基酸替换矩阵。
Proc Natl Acad Sci U S A. 1992 Nov 15;89(22):10915-9. doi: 10.1073/pnas.89.22.10915.