用于检测DNA序列中具有统计学意义模式的WORDUP算法的单指令多数据（SIMD）并行化。

SIMD parallelization of the WORDUP algorithm for detecting statistically significant patterns in DNA sequences.

作者信息

Liuni S, Prunella N, Pesole G, D'Orazio T, Stella E, Distante A

机构信息

CSMME-CNR, Bari, Italy.

出版信息

Comput Appl Biosci. 1993 Dec;9(6):701-7. doi: 10.1093/bioinformatics/9.6.701.

DOI:10.1093/bioinformatics/9.6.701

PMID:8143157

Abstract

The development of new techniques in sequencing nuclei acids has produced a great amount of sequence data and has led to the discovery of new relationships. In this paper, we study a method for parallelizing the algorithm WORDUP, which detects the presence of statistically significant patterns in DNA sequences. WORDUP implements an efficient method to identify the presence of statistically significant oligomers in a non-homologous group of sequences. It is based on a modified version of the Boyer-Moore algorithm, which is one of the fastest algorithms for string matching available in the literature. The aim of the parallel version of WORDUP presented here is to speed up the computational time and allow the analysis of a greater set of longer nucleotide sequences, which is usually impractical with sequential algorithms.

摘要

核酸测序新技术的发展产生了大量的序列数据，并促成了新关系的发现。在本文中，我们研究了一种将算法WORDUP并行化的方法，该算法可检测DNA序列中具有统计学意义的模式。WORDUP实现了一种高效方法，用于识别非同源序列组中具有统计学意义的寡聚物的存在。它基于Boyer-Moore算法的一个修改版本，该算法是文献中可用的最快的字符串匹配算法之一。本文提出的WORDUP并行版本的目的是加快计算时间，并允许分析更大的一组更长的核苷酸序列，而这对于顺序算法来说通常是不切实际的。