通过使用寡聚体表提高点阵相似性搜索的效率。

Improving the efficiency of dot-matrix similarity searches through use of an oligomer table.

作者信息

Fristensky B

出版信息

Nucleic Acids Res. 1986 Jan 10;14(1):597-610. doi: 10.1093/nar/14.1.597.

DOI:10.1093/nar/14.1.597

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC339447/

Abstract

Dot-matrix sequence similarity searches can be greatly speeded up through use of a table listing all locations of short oligomers in one of the sequences to find potential similarities with a second sequence. The algorithm described finds similarities between two sequences of lengths M and N, comparing L residues at a time, with an efficiency of L X M X N/(SK) where S is the alphabet size, and k is the length of the oligomer. For nucleic acids, in which S = 4, use of a tetranucleotide table results in an efficiency of L X M X N/256. The simplicity of the approach allows for a straightforward calculation of the level of similarities expected to be found for given search parameters. Furthermore, the storage required is minimal, allowing for even large sequences to be compared on small microcomputers. Theoretical considerations regarding the use of this search are discussed.

摘要

通过使用一个列出短寡聚物在其中一个序列中所有位置的表格来寻找与第二个序列的潜在相似性，点阵序列相似性搜索可以大大加快速度。所描述的算法可找到长度分别为M和N的两个序列之间的相似性，每次比较L个残基，效率为L×M×N/(SK)，其中S是字母表大小，k是寡聚物的长度。对于核酸，S = 4，使用四核苷酸表的效率为L×M×N/256。该方法的简单性使得可以直接计算在给定搜索参数下预期发现的相似性水平。此外，所需的存储量最小，甚至可以在小型微型计算机上比较大的序列。讨论了关于使用这种搜索的理论考虑因素。

相似文献

1

Improving the efficiency of dot-matrix similarity searches through use of an oligomer table.通过使用寡聚体表提高点阵相似性搜索的效率。

Nucleic Acids Res. 1986 Jan 10;14(1):597-610. doi: 10.1093/nar/14.1.597.

2

Fast analysis of DNA and protein sequence on Apple IIe: restriction sites search, alignment of short sequence and dot matrix analysis.在苹果IIe计算机上对DNA和蛋白质序列进行快速分析：限制酶切位点搜索、短序列比对及点阵分析。

Nucleic Acids Res. 1986 Jan 10;14(1):583-90. doi: 10.1093/nar/14.1.583.

3

DNA sequence analysis: a procedure to find homologies among many sequences.DNA序列分析：一种在众多序列中寻找同源性的程序。

Nucleic Acids Res. 1986 Jan 10;14(1):543-50. doi: 10.1093/nar/14.1.543.

4

Rapid and sensitive protein similarity searches.快速且灵敏的蛋白质相似性搜索。

Science. 1985 Mar 22;227(4693):1435-41. doi: 10.1126/science.2983426.

5

Analysis of large nucleic acid dot matrices on small computers.小型计算机上的大型核酸点阵分析

Nucleic Acids Res. 1984 Jan 11;12(1 Pt 2):767-76. doi: 10.1093/nar/12.1part2.767.

6

Alignment of nucleotide or amino acid sequences on microcomputers, using a modification of Sellers' (1974) algorithm which avoids the need for calculation of the complete distance matrix.利用对塞勒斯（1974年）算法的一种改进，在微型计算机上对核苷酸或氨基酸序列进行比对，该改进避免了计算完整距离矩阵的需要。

Comput Methods Programs Biomed. 1985 Oct;21(1):3-10. doi: 10.1016/0169-2607(85)90057-4.

7

The diagonal-traverse homology search algorithm for locating similarities between two sequences.用于定位两个序列之间相似性的对角线遍历同源性搜索算法。

Nucleic Acids Res. 1984 Jan 11;12(1 Pt 2):751-66. doi: 10.1093/nar/12.1part2.751.

8

Motif recognition and alignment for many sequences by comparison of dot-matrices.通过点阵比较对多个序列进行基序识别与比对。

J Mol Biol. 1991 Mar 5;218(1):33-43. doi: 10.1016/0022-2836(91)90871-3.

9

A comprehensive sequence analysis program for the IBM personal computer.一款适用于IBM个人计算机的综合序列分析程序。

Nucleic Acids Res. 1984 Jan 11;12(1 Pt 2):581-99. doi: 10.1093/nar/12.1part2.581.

10

Personal access to sequence databases on personal computers.个人通过个人电脑访问序列数据库。

Nucleic Acids Res. 1986 Jan 10;14(1):611-9. doi: 10.1093/nar/14.1.611.

引用本文的文献

1

cDNA sequences for pea disease resistance response genes.豌豆抗病反应基因的cDNA序列。

Plant Mol Biol. 1988 Sep;11(5):713-5. doi: 10.1007/BF00017470.

2

A cereal haemoglobin gene is expressed in seed and root tissues under anaerobic conditions.一种谷类血红蛋白基因在厌氧条件下在种子和根组织中表达。

Plant Mol Biol. 1994 Mar;24(6):853-62. doi: 10.1007/BF00014440.

3

A fast word search algorithm for the representation of sequence similarity in genomic DNA.一种用于表示基因组DNA序列相似性的快速词搜索算法。

Nucleic Acids Res. 1994 Feb 11;22(3):404-11. doi: 10.1093/nar/22.3.404.

4

Analysis of the integrant in MyK-103 transgenic mice in which males fail to transmit the integrant.对MyK - 103转基因小鼠中整合体的分析，在这些小鼠中雄性无法传递整合体。

Mol Cell Biol. 1987 May;7(5):1646-55. doi: 10.1128/mcb.7.5.1646-1655.1987.

5

Dual bidirectional promoters at the mouse dhfr locus: cloning and characterization of two mRNA classes of the divergently transcribed Rep-1 gene.小鼠二氢叶酸还原酶（dhfr）基因座处的双向启动子：双向转录的Rep-1基因两种mRNA类别的克隆与表征

Mol Cell Biol. 1989 Jul;9(7):3058-72. doi: 10.1128/mcb.9.7.3058-3072.1989.

本文引用的文献

1

The nucleotide sequence of the ubiquitous repetitive DNA sequence B1 complementary to the most abundant class of mouse fold-back RNA.与最丰富的一类小鼠回折RNA互补的普遍存在的重复DNA序列B1的核苷酸序列。

Nucleic Acids Res. 1980 Mar 25;8(6):1201-15. doi: 10.1093/nar/8.6.1201.

2

Recognition of protein coding regions in DNA sequences.DNA序列中蛋白质编码区域的识别。

Nucleic Acids Res. 1982 Sep 11;10(17):5303-18. doi: 10.1093/nar/10.17.5303.

3

An efficient method for finding repeats in molecular sequences.一种在分子序列中查找重复序列的有效方法。

Nucleic Acids Res. 1983 Jul 11;11(13):4629-34. doi: 10.1093/nar/11.13.4629.

4

Statistical characterization of nucleic acid sequence functional domains.核酸序列功能域的统计学特征

Nucleic Acids Res. 1983 Apr 11;11(7):2205-20. doi: 10.1093/nar/11.7.2205.

5

Enhanced graphic matrix analysis of nucleic acid and protein sequences.核酸和蛋白质序列的增强图形矩阵分析

Proc Natl Acad Sci U S A. 1981 Dec;78(12):7665-9. doi: 10.1073/pnas.78.12.7665.

6

Pattern recognition in nucleic acid sequences. I. A general method for finding local homologies and symmetries.核酸序列中的模式识别。I. 寻找局部同源性和对称性的通用方法。

Nucleic Acids Res. 1982 Jan 11;10(1):247-63. doi: 10.1093/nar/10.1.247.

7

On the statistical significance of nucleic acid similarities.论核酸相似性的统计学意义。

Nucleic Acids Res. 1984 Jan 11;12(1 Pt 1):215-26. doi: 10.1093/nar/12.1part1.215.

8

Rapid similarity searches of nucleic acid and protein data banks.核酸和蛋白质数据库的快速相似性搜索。

Proc Natl Acad Sci U S A. 1983 Feb;80(3):726-30. doi: 10.1073/pnas.80.3.726.

9

A fast homology program for aligning biological sequences.一种用于比对生物序列的快速同源性程序。

Nucleic Acids Res. 1984 Jan 11;12(1 Pt 2):447-55. doi: 10.1093/nar/12.1part2.447.

10

A computer graphics study of sequence-directed bending in DNA.一项关于DNA序列导向弯曲的计算机图形学研究。

J Biomol Struct Dyn. 1983 Oct;1(2):429-35. doi: 10.1080/07391102.1983.10507452.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验