Cammarano P, Creti R, Sanangelantoni A M, Palm P
Istituto Pasteur Fondazione Cenci-Bolognetti, Dipartimento di Biotecnologie cellulari ed Ematologia, Sezione di Genetica molecolare, Universita' di Roma I "La Sapienza," Policlinico Umberto I, Viale Regina Elena 324, 00161 Roma, Italy.
J Mol Evol. 1999 Oct;49(4):524-37. doi: 10.1007/pl00006574.
A global alignment of EF-G(2) sequences was corrected by reference to protein structure. The selection of characters eligible for construction of phylogenetic trees was optimized by searching for regions arising from the artifactual matching of sequence segments unique to different phylogenetic domains. The spurious matchings were identified by comparing all sections of the global alignment with a comprehensive inventory of significant binary alignments obtained by BLAST probing of the DNA and protein databases with representative EF-G(2) sequences. In three discrete alignment blocks (one in domain II and two in domain IV), the alignment of the bacterial sequences with those of Archaea-Eucarya was not retrieved by database probing with EF-G(2) sequences, and no EF-G homologue of the EF-2 sequence segments was detected by using partial EF-G(2) sequences as probes in BLAST/FASTA searches. The two domain IV regions (one of which comprises the ADP-ribosylatable site of EF-2) are almost certainly due to the artifactual alignment of insertion segments that are unique to Bacteria and to Archaea-Eucarya. Phylogenetic trees have been constructed from the global alignment after deselecting positions encompassing the unretrieved, spuriously aligned regions, as well as positions arising from misalignment of the G' and G" subdomain insertion segments flanking the "fifth" consensus motif of the G domain (AE varsson, 1995). The results show inconsistencies between trees inferred by alternative methods and alternative (DNA and protein) data sets with regard to Archaea being a monophyletic or paraphyletic grouping. Both maximum-likelihood and maximum-parsimony methods do not allow discrimination (by log-likelihood difference and difference in number of inferred substitutions) between the conflicting (monophyletic vs. paraphyletic Archaea) topologies. No specific EF-2 insertions (or terminal accretions) supporting a crenarchaeal-eucaryal clade are detectable in the new EF-G(2) sequence alignment.
通过参考蛋白质结构对EF-G(2)序列进行了全局比对校正。通过搜索不同系统发育域特有的序列片段的人为匹配所产生的区域,优化了用于构建系统发育树的合格字符的选择。通过将全局比对的所有部分与通过用代表性的EF-G(2)序列对DNA和蛋白质数据库进行BLAST探测获得的重要二元比对的综合清单进行比较,识别出了虚假匹配。在三个离散的比对块中(一个在结构域II中,两个在结构域IV中),用EF-G(2)序列进行数据库探测未检索到细菌序列与古菌-真核生物序列的比对,并且在BLAST/FASTA搜索中使用部分EF-G(2)序列作为探针未检测到EF-2序列片段的EF-G同源物。结构域IV的两个区域(其中一个包含EF-2的ADP-核糖基化位点)几乎肯定是由于细菌和古菌-真核生物特有的插入片段的人为比对造成的。在去除包含未检索到的、错误比对区域的位置以及G结构域“第五”共有基序两侧的G'和G"亚结构域插入片段比对错误产生的位置后,从全局比对构建了系统发育树(AE varsson,1995)。结果表明,在古菌是单系还是并系分组方面,通过替代方法和替代(DNA和蛋白质)数据集推断的树之间存在不一致。最大似然法和最大简约法都无法(通过对数似然差异和推断替换数的差异)区分相互冲突的(单系与并系古菌)拓扑结构。在新的EF-G(2)序列比对中未检测到支持泉古菌-真核生物分支的特定EF-2插入(或末端附加物)。