Henikoff S, Wallace J C
Fred Hutchinson Cancer Research Center, Seattle, WA 98104.
Nucleic Acids Res. 1988 Jul 11;16(13):6191-204. doi: 10.1093/nar/16.13.6191.
A simple procedure is described for finding similarities between proteins using nucleotide sequence databases. The approach is illustrated by several examples of previously unknown correspondences with important biological implications: Drosophila elongation factor Tu is shown to be encoded by two genes that are differently expressed during development; a cluster of three Drosophila genes likely encode maltases; a flesh-fly fat body protein resembles the hypothesized Drosophila alcohol dehydrogenase ancestral protein; an unknown protein encoded at the multifunctional E. coli hisT locus resembles aspartate beta-semialdehyde dehydrogenase; and the E. coli tyrR protein is related to nitrogen regulatory proteins. These and other matches were discovered using a personal computer of the type available in most laboratories collecting DNA sequence data. As relatively few sequences were sampled to find these matches, it is likely that much of the existing data has not been adequately examined.
本文描述了一种利用核苷酸序列数据库寻找蛋白质之间相似性的简单方法。通过几个具有重要生物学意义的此前未知对应关系的例子对该方法进行了说明:果蝇延伸因子Tu由两个在发育过程中表达不同的基因编码;一组三个果蝇基因可能编码麦芽糖酶;一种肉蝇脂肪体蛋白类似于推测的果蝇酒精脱氢酶祖先蛋白;在多功能大肠杆菌hisT位点编码的一种未知蛋白类似于天冬氨酸β-半醛脱氢酶;并且大肠杆菌tyrR蛋白与氮调节蛋白相关。这些以及其他匹配是使用大多数收集DNA序列数据的实验室都有的那种个人计算机发现的。由于为找到这些匹配而采样的序列相对较少,很可能现有数据中的大部分尚未得到充分研究。