Parish J H, Bentley J
Department of Biochemistry and Molecular Biology, University of Leeds, United Kingdom.
J Mol Evol. 1996 Feb;42(2):281-93. doi: 10.1007/BF02198855.
We have used three reference sequences representative of bacterial drug resistance pumps and sugar transport proteins to collect the 91 most closely related sequences from a composite, nonredundant protein sequence database. Having eliminated certain very close relatives, the remainder were subjected to analysis and alignment by using two different similarity matrices: one of these was a matrix based on structural conservation of amino acid residues in proteins of known conformation and the other was based on the more familiar mutational matrix. Unrooted similarity trees for these proteins were constructed for each matrix and compared. A systematic analysis of the differences between these trees was undertaken and the sequences were analyzed for the presence or absence of certain sequence motifs. The results show that the clades created by the two methods are broadly comparable but that there are some clusters of sequences that are significantly different. Further analysis confirmed that (1) the sequences collected by this objective method are all known or putative 12-helix (in some cases reported as 14-helix) transmembrane proteins, (2) there is evidence for few cases of an origin based on gene duplication, (3) the bacterial drug resistance pumps are distributed in more than one clade and cannot be regarded as a definitive subset of these proteins, and that (4) the diversity is such that there is no evidence of a single ancestral protein. The possible extension of the methods to other cases of divergent protein sequences is discussed.
我们使用了三种代表细菌耐药泵和糖转运蛋白的参考序列,从一个复合的、非冗余的蛋白质序列数据库中收集了91条最密切相关的序列。在剔除了某些非常相近的亲属序列后,其余序列使用两种不同的相似性矩阵进行分析和比对:其中一种矩阵基于已知构象蛋白质中氨基酸残基的结构保守性,另一种基于更为常见的突变矩阵。针对每种矩阵构建了这些蛋白质的无根相似性树并进行比较。对这些树之间的差异进行了系统分析,并分析了序列中某些序列基序的有无。结果表明,两种方法产生的进化枝大致可比,但存在一些显著不同的序列簇。进一步分析证实:(1)通过这种客观方法收集的序列均为已知或推测的12螺旋(在某些情况下报告为14螺旋)跨膜蛋白;(2)基于基因复制的起源情况很少;(3)细菌耐药泵分布在多个进化枝中,不能被视为这些蛋白质的一个确定子集;(4)多样性如此之高,以至于没有单一祖先蛋白的证据。还讨论了将这些方法扩展到其他分歧蛋白质序列情况的可能性。