ParAlign：一种用于快速且灵敏的数据库搜索的并行序列比对算法。

ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches.

作者信息

Rognes T

机构信息

Department of Molecular Biology, Institute of Medical Microbiology, University of Oslo, The National Hospital, NO-0027 Oslo, Norway.

出版信息

Nucleic Acids Res. 2001 Apr 1;29(7):1647-52. doi: 10.1093/nar/29.7.1647.

DOI:10.1093/nar/29.7.1647

PMID:11266569

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC31274/

Abstract

There is a need for faster and more sensitive algorithms for sequence similarity searching in view of the rapidly increasing amounts of genomic sequence data available. Parallel processing capabilities in the form of the single instruction, multiple data (SIMD) technology are now available in common microprocessors and enable a single microprocessor to perform many operations in parallel. The ParAlign algorithm has been specifically designed to take advantage of this technology. The new algorithm initially exploits parallelism to perform a very rapid computation of the exact optimal ungapped alignment score for all diagonals in the alignment matrix. Then, a novel heuristic is employed to compute an approximate score of a gapped alignment by combining the scores of several diagonals. This approximate score is used to select the most interesting database sequences for a subsequent Smith-Waterman alignment, which is also parallelised. The resulting method represents a substantial improvement compared to existing heuristics. The sensitivity and specificity of ParAlign was found to be as good as Smith-Waterman implementations when the same method for computing the statistical significance of the matches was used. In terms of speed, only the significantly less sensitive NCBI BLAST 2 program was found to outperform the new approach. Online searches are available at http://dna.uio.no/search/

摘要

鉴于可获取的基因组序列数据量迅速增加，需要更快且更灵敏的算法用于序列相似性搜索。单指令多数据（SIMD）技术形式的并行处理能力如今在普通微处理器中已具备，能使单个微处理器并行执行许多操作。ParAlign算法就是专门为利用这项技术而设计的。新算法首先利用并行性对比对矩阵中所有对角线的精确最优无间隙比对分数进行非常快速的计算。然后，采用一种新颖的启发式方法，通过组合几条对角线的分数来计算有间隙比对的近似分数。这个近似分数用于选择后续史密斯 - 沃特曼比对中最有趣的数据库序列，该比对也进行了并行化处理。与现有的启发式方法相比，所得方法有显著改进。当使用相同方法计算匹配的统计显著性时，发现ParAlign的灵敏度和特异性与史密斯 - 沃特曼算法的实现效果相当。在速度方面，仅发现灵敏度明显较低的NCBI BLAST 2程序比新方法表现更优。可在http://dna.uio.no/search/进行在线搜索。

相似文献

ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches.ParAlign：一种用于快速且灵敏的数据库搜索的并行序列比对算法。

Nucleic Acids Res. 2001 Apr 1;29(7):1647-52. doi: 10.1093/nar/29.7.1647.

Six-fold speed-up of Smith-Waterman sequence database searches using parallel processing on common microprocessors.使用普通微处理器上的并行处理技术，将史密斯-沃特曼序列数据库搜索速度提高六倍。

Bioinformatics. 2000 Aug;16(8):699-706. doi: 10.1093/bioinformatics/16.8.699.

Accelerated Profile HMM Searches.加速轮廓隐马尔可夫模型搜索。

PLoS Comput Biol. 2011 Oct;7(10):e1002195. doi: 10.1371/journal.pcbi.1002195. Epub 2011 Oct 20.

PARALIGN: rapid and sensitive sequence similarity searches powered by parallel computing technology.PARALIGN：由并行计算技术驱动的快速且灵敏的序列相似性搜索。

Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W535-9. doi: 10.1093/nar/gki423.

Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation.利用序列间 SIMD 并行化实现更快的 Smith-Waterman 数据库搜索。

BMC Bioinformatics. 2011 Jun 1;12:221. doi: 10.1186/1471-2105-12-221.

SALSA: improved protein database searching by a new algorithm for assembly of sequence fragments into gapped alignments.SALSA：通过一种将序列片段组装成带空位比对的新算法改进蛋白质数据库搜索。

Bioinformatics. 1998;14(10):839-45. doi: 10.1093/bioinformatics/14.10.839.

PLAST: parallel local alignment search tool for database comparison.PLAST：用于数据库比较的并行局部比对搜索工具。

BMC Bioinformatics. 2009 Oct 12;10:329. doi: 10.1186/1471-2105-10-329.

Striped Smith-Waterman speeds database searches six times over other SIMD implementations.条纹史密斯-沃特曼算法在数据库搜索速度上比其他单指令多数据（SIMD）实现快六倍。

Bioinformatics. 2007 Jan 15;23(2):156-61. doi: 10.1093/bioinformatics/btl582. Epub 2006 Nov 16.

A table-driven, full-sensitivity similarity search algorithm.一种基于表格驱动的全灵敏度相似性搜索算法。

J Comput Biol. 2003;10(2):103-17. doi: 10.1089/106652703321825919.

An efficient algorithm after ungapped analysis in BLAST.一种在BLAST中进行无间隙分析后的高效算法。

DNA Seq. 2006 Aug;17(4):247-53. doi: 10.1080/10425170600805128.

引用本文的文献

MODOMICS: a database of RNA modification pathways. 2017 update.MODOMICS：RNA 修饰途径数据库。2017 年更新。

Nucleic Acids Res. 2018 Jan 4;46(D1):D303-D307. doi: 10.1093/nar/gkx1030.

Exploring the utility of cross-laboratory RAD-sequencing datasets for phylogenetic analysis.探索跨实验室RAD测序数据集在系统发育分析中的效用。

BMC Res Notes. 2015 Jul 8;8:299. doi: 10.1186/s13104-015-1261-2.

An algorithm of discovering signatures from DNA databases on a computer cluster.一种在计算机集群上从DNA数据库中发现特征序列的算法。

BMC Bioinformatics. 2014 Oct 5;15(1):339. doi: 10.1186/1471-2105-15-339.

Coiled-coil proteins facilitated the functional expansion of the centrosome.卷曲螺旋蛋白促进了中心体的功能扩展。

PLoS Comput Biol. 2014 Jun 5;10(6):e1003657. doi: 10.1371/journal.pcbi.1003657. eCollection 2014 Jun.

PSimScan: algorithm and utility for fast protein similarity search.PSimScan：快速蛋白质相似性搜索的算法和工具。

PLoS One. 2013;8(3):e58505. doi: 10.1371/journal.pone.0058505. Epub 2013 Mar 7.

Accelerated Profile HMM Searches.加速轮廓隐马尔可夫模型搜索。

PLoS Comput Biol. 2011 Oct;7(10):e1002195. doi: 10.1371/journal.pcbi.1002195. Epub 2011 Oct 20.

A quick guide for developing effective bioinformatics programming skills.培养有效生物信息学编程技能的快速指南。

PLoS Comput Biol. 2009 Dec;5(12):e1000589. doi: 10.1371/journal.pcbi.1000589. Epub 2009 Dec 24.

Cohesive versus flexible evolution of functional modules in eukaryotes.真核生物中功能模块的凝聚性进化与灵活性进化

PLoS Comput Biol. 2009 Jan;5(1):e1000276. doi: 10.1371/journal.pcbi.1000276. Epub 2009 Jan 30.

Overexpression of the LexA-regulated tisAB RNA in E. coli inhibits SOS functions; implications for regulation of the SOS response.大肠杆菌中LexA调控的tisAB RNA的过表达会抑制SOS功能；对SOS应答调控的影响。

Nucleic Acids Res. 2008 Nov;36(19):6249-59. doi: 10.1093/nar/gkn633. Epub 2008 Oct 1.

RNAmmer: consistent and rapid annotation of ribosomal RNA genes.RNAmmer：核糖体RNA基因的一致性快速注释

Nucleic Acids Res. 2007;35(9):3100-8. doi: 10.1093/nar/gkm160. Epub 2007 Apr 22.

本文引用的文献

Bioinformatics. 2000 Aug;16(8):699-706. doi: 10.1093/bioinformatics/16.8.699.

Accurate formula for P-values of gapped local sequence and profile alignments.带间隔的局部序列和轮廓比对P值的精确公式。

J Mol Biol. 2000 Jul 14;300(3):649-59. doi: 10.1006/jmbi.2000.3875.

The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000.2000年的SWISS-PROT蛋白质序列数据库及其补充数据库TrEMBL。

Nucleic Acids Res. 2000 Jan 1;28(1):45-8. doi: 10.1093/nar/28.1.45.

Predicting protein structure using only sequence information.仅使用序列信息预测蛋白质结构。

Proteins. 1999;Suppl 3:121-5. doi: 10.1002/(sici)1097-0134(1999)37:3+<121::aid-prot16>3.3.co;2-h.

Combining sensitive database searches with multiple intermediates to detect distant homologues.将敏感的数据库搜索与多个中间步骤相结合以检测远缘同源物。

Protein Eng. 1999 Feb;12(2):95-100. doi: 10.1093/protein/12.2.95.

Bioinformatics. 1998;14(10):839-45. doi: 10.1093/bioinformatics/14.10.839.

Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships.利用可靠的结构鉴定远距离进化关系评估序列比较方法。

Proc Natl Acad Sci U S A. 1998 May 26;95(11):6073-8. doi: 10.1073/pnas.95.11.6073.

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.空位BLAST和位置特异性迭代BLAST：新一代蛋白质数据库搜索程序。

Nucleic Acids Res. 1997 Sep 1;25(17):3389-402. doi: 10.1093/nar/25.17.3389.

Using video-oriented instructions to speed up sequence comparison.使用面向视频的指令来加速序列比较。

Comput Appl Biosci. 1997 Apr;13(2):145-50. doi: 10.1093/bioinformatics/13.2.145.

Parallel hardware for sequence comparison and alignment.用于序列比较和比对的并行硬件。

Comput Appl Biosci. 1996 Dec;12(6):473-9. doi: 10.1093/bioinformatics/12.6.473.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验