通过氨基酸序列比较检测移码突变。

Detecting frame shifts by amino acid sequence comparison.

作者信息

Claverie J M

机构信息

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health Bethesda, MD 20894.

出版信息

J Mol Biol. 1993 Dec 20;234(4):1140-57. doi: 10.1006/jmbi.1993.1666.

DOI:10.1006/jmbi.1993.1666

PMID:7903399

Abstract

Various amino acid substitution scoring matrices are used in conjunction with local alignments programs to detect regions of similarity and infer potential common ancestry between proteins. The usual scoring schemes derive from the implicit hypothesis that related proteins evolve from a common ancestor by the accumulation of point mutations and that amino acids tend to be progressively substituted by others with similar properties. However, other frequent single mutation events, like nucleotide insertion or deletion and gene inversion, change the translation reading frame and cause previously encoded amino acid sequences to become unrecognizable at once. Here, I derive five new types of scoring matrix, each capable of detecting a specific frame shift (deletion, insertion and inversion in 3 frames) and use them with a regular local alignments program to detect amino acid sequences that may have derived from alternative reading frames of the same nucleotide sequence. Frame shifts are inferred from the sole comparison of the protein sequences. The five scoring matrices were used with the BLASTP program to compare all the protein sequences in the Swissprot database. Surprisingly, the searches revealed hundreds of highly significant frame shift matches, of which many are likely to represent sequencing errors. Others provide some evidence that frame shift mutations might be used in protein evolution as a way to create new amino acid sequences from pre-existing coding regions.

摘要

各种氨基酸替换计分矩阵与局部比对程序结合使用，以检测相似区域并推断蛋白质之间潜在的共同祖先。通常的计分方案源自隐含的假设，即相关蛋白质通过点突变的积累从共同祖先进化而来，并且氨基酸倾向于逐渐被具有相似性质的其他氨基酸取代。然而，其他常见的单突变事件，如核苷酸插入或缺失以及基因倒位，会改变翻译阅读框，并导致先前编码的氨基酸序列立即变得无法识别。在此，我推导了五种新型计分矩阵，每种矩阵都能够检测特定的移码（三个阅读框中的缺失、插入和倒位），并将它们与常规的局部比对程序一起使用，以检测可能源自同一核苷酸序列的替代阅读框的氨基酸序列。移码是从蛋白质序列的唯一比较中推断出来的。这五种计分矩阵与BLASTP程序一起用于比较Swissprot数据库中的所有蛋白质序列。令人惊讶的是，搜索揭示了数百个高度显著的移码匹配，其中许多可能代表测序错误。其他结果提供了一些证据，表明移码突变可能在蛋白质进化中被用作从先前存在的编码区域创建新氨基酸序列的一种方式。