Suppr超能文献

用于单遍序列识别的序列搜索算法:一种尺寸能适合所有情况吗?

Sequence search algorithms for single pass sequence identification: does one size fit all?

作者信息

Woodwark K C, Hubbard S J, Oliver S G

机构信息

Department of Biomolecular Sciences UMIST, Manchester M60 1QD, UK.

出版信息

Comp Funct Genomics. 2001;2(1):4-9. doi: 10.1002/cfg.61.

Abstract

Bioinformatic tools have become essential to biologists in their quest to understand the vast quantities of sequence data, and now whole genomes, which are being produced at an ever increasing rate. Much of these sequence data are single-pass sequences, such as sample sequences from organisms closely related to other organisms of interest which have already been sequenced, or cDNAs or expressed sequence tags (ESTs). These single-pass sequences often contain errors, including frameshifts, which complicate the identification of homologues, especially at the protein level. Therefore, sequence searches with this type of data are often performed at the nucleotide level. The most commonly used sequence search algorithms for the identification of homologues are Washington University's and the National Center for Biotechnology Information's (NCBI) versions of the BLAST suites of tools, which are to be found on websites all over the world. The work reported here examines the use of these tools for comparing sample sequence datasets to a known genome. It shows that care must be taken when choosing the parameters to use with the BLAST algorithms. NCBI's version of gapped BLASTn gives much shorter, and sometimes different, top alignments to those found using Washington University's version of BLASTn (which also allows for gaps), when both are used with their default parameters. Most of the differences in performance were found to be due to the choices of default parameters rather than underlying differences between the two algorithms. Washington University's version, used with defaults, compares very favourably with the results obtained using the accurate but computationally intensive Smith-Waterman algorithm.

摘要

生物信息学工具对于生物学家理解大量的序列数据以及如今不断增加的全基因组数据来说已经变得至关重要。这些序列数据大多是单通道序列,比如来自与已测序的其他感兴趣生物密切相关的生物的样本序列,或者是cDNA或表达序列标签(EST)。这些单通道序列常常包含错误,包括移码突变,这使得同源物的鉴定变得复杂,尤其是在蛋白质水平。因此,使用这类数据进行序列搜索通常在核苷酸水平上进行。用于鉴定同源物的最常用序列搜索算法是华盛顿大学和美国国立生物技术信息中心(NCBI)的BLAST工具套件版本,这些工具可以在世界各地的网站上找到。本文所报告的工作研究了使用这些工具将样本序列数据集与已知基因组进行比较的情况。结果表明,在选择与BLAST算法一起使用的参数时必须谨慎。当NCBI的空位BLASTn版本和华盛顿大学的BLASTn版本(也允许有空位)都使用默认参数时,NCBI版本给出的顶部比对要短得多,有时甚至不同。发现性能上的大多数差异是由于默认参数的选择,而不是两种算法之间的根本差异。华盛顿大学的版本在使用默认参数时,与使用准确但计算量很大的史密斯-沃特曼算法所获得的结果相比非常有利。

相似文献

3
Comparing compressed sequences for faster nucleotide BLAST searches.比较压缩序列以进行更快的核苷酸BLAST搜索。
IEEE/ACM Trans Comput Biol Bioinform. 2007 Jul-Sep;4(3):349-64. doi: 10.1109/TCBB.2007.1029.
4
G-BLASTN: accelerating nucleotide alignment by graphics processors.G-BLASTN:通过图形处理器加速核苷酸比对。
Bioinformatics. 2014 May 15;30(10):1384-91. doi: 10.1093/bioinformatics/btu047. Epub 2014 Jan 24.
5
A performance enhanced PSI-BLAST based on hybrid alignment.基于混合比对的性能增强 PSI-BLAST。
Bioinformatics. 2011 Jan 1;27(1):31-7. doi: 10.1093/bioinformatics/btq621. Epub 2010 Nov 24.
7
Making sense of EST sequences by CLOBBing them.通过CLOBBing法理解EST序列。
BMC Bioinformatics. 2002 Oct 25;3:31. doi: 10.1186/1471-2105-3-31.
8
High speed BLASTN: an accelerated MegaBLAST search tool.高速BLASTN:一种加速的MegaBLAST搜索工具。
Nucleic Acids Res. 2015 Sep 18;43(16):7762-8. doi: 10.1093/nar/gkv784. Epub 2015 Aug 6.

本文引用的文献

1
Chromosomal evolution in Saccharomyces.酿酒酵母中的染色体进化
Nature. 2000 May 25;405(6785):451-4. doi: 10.1038/35013058.
2
Updated map of duplicated regions in the yeast genome.酵母基因组中重复区域的更新图谱。
Gene. 1999 Sep 30;238(1):253-61. doi: 10.1016/s0378-1119(99)00319-4.
7
Life with 6000 genes.拥有6000个基因的生命。
Science. 1996 Oct 25;274(5287):546, 563-7. doi: 10.1126/science.274.5287.546.
8
Identification of common molecular subsequences.常见分子子序列的鉴定
J Mol Biol. 1981 Mar 25;147(1):195-7. doi: 10.1016/0022-2836(81)90087-5.
9
Improved tools for biological sequence comparison.用于生物序列比较的改进工具。
Proc Natl Acad Sci U S A. 1988 Apr;85(8):2444-8. doi: 10.1073/pnas.85.8.2444.
10
Basic local alignment search tool.基本局部比对搜索工具
J Mol Biol. 1990 Oct 5;215(3):403-10. doi: 10.1016/S0022-2836(05)80360-2.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验