生物序列数据库搜索灵敏度的提高。

Improved sensitivity of biological sequence database searches.

作者信息

Brutlag D L, Dautricourt J P, Maulik S, Relph J

机构信息

Department of Biochemistry, Beckman Center, Stanford University School of Medicine, CA 94305.

出版信息

Comput Appl Biosci. 1990 Jul;6(3):237-45. doi: 10.1093/bioinformatics/6.3.237.

DOI:10.1093/bioinformatics/6.3.237

PMID:2207748

Abstract

We have increased the sensitivity of DNA and protein sequence database searches by allowing similar but non-identical amino acids or nucleotides to match. In addition, one can match k-tuples or words instead of matching individual residues in order to speed the search. A matching matrix species which k-tuples match each other. The matching matrix can be calculated from a similarity matrix of amino acids and a threshold of similarity required for matching. This permits amino acid similarity matrices or replacement matrices (PAM matrices) to be used in the first step of a sequence comparison rather than in a secondary scoring phase. The concept of matching non-identical k-tuples also increases the power of DNA database searches. For example, a matrix that specifies that any 3-tuple in a DNA sequence can match any other 3-tuple encoding the same amino acid permits a DNA database search using a DNA query sequence for regions that would encode a similar amino acid sequence.

摘要

我们通过允许相似但不相同的氨基酸或核苷酸进行匹配，提高了DNA和蛋白质序列数据库搜索的灵敏度。此外，为了加快搜索速度，可以匹配k元组或单词，而不是匹配单个残基。匹配矩阵规定了哪些k元组相互匹配。匹配矩阵可以根据氨基酸相似性矩阵和匹配所需的相似性阈值来计算。这使得氨基酸相似性矩阵或替换矩阵（PAM矩阵）能够在序列比较的第一步中使用，而不是在二级评分阶段使用。匹配不相同k元组的概念也增强了DNA数据库搜索的能力。例如，一个规定DNA序列中的任何三联体都可以与编码相同氨基酸的任何其他三联体匹配的矩阵，允许使用DNA查询序列在DNA数据库中搜索可能编码相似氨基酸序列的区域。