McCrow John P
J Craig Venter Institute, San Diego, CA 92121, USA.
J Comput Biol. 2009 Nov;16(11):1517-28. doi: 10.1089/cmb.2009.0188.
High levels of alignment errors associated with gaps have generally meant their exclusion from phylogenetic analysis. Conserved inserts and deletions (indels) may in some cases be less subject to errors than amino acid substitutions for inferring the history of genomes and identifying recently laterally transferred genes, but alignment error near gaps must be evaluated prior to using indels as phylogenetic characters. A method is presented for evaluating the phylogenetic unambiguity of gaps in multiple sequence alignments by allowing a defined amount of pairwise alignment ambiguity. This work considers the bacterial genus Shewanella, which is of particular interest for applications of bioremediation and environmental engineering. Understanding the genetic history of these species is vital for these applications. A set of pairwise dynamic programming alignments is constructed to test positions in multiple alignments for phylogenetic unambiguity, and a whole genome scan is done on protein sequences from 11 sequenced species of the bacterial genus Shewanella. The splits defined by phylogenetically unambiguous indels are then used as characters for phylogenetic analysis, and results are compared to whole genome Maximum Likelihood phylogeny. A comparable description of the history of the species is found, as well as a set of lateral gene transfer candidates undetectable by traditional analysis of amino acid substitutions. This analysis is applicable to other taxonomic units at all levels and has the potential to allow cataloging of clear genome-wide phylogenetic markers for taxonomic profiling down to the species level.
与空位相关的高水平比对错误通常意味着在系统发育分析中会将它们排除在外。在推断基因组历史和识别最近横向转移的基因时,保守插入和缺失(indels)在某些情况下可能比氨基酸替换更不易出错,但在将indels用作系统发育特征之前,必须先评估空位附近的比对错误。本文提出了一种方法,通过允许一定程度的成对比对模糊性来评估多序列比对中空位的系统发育明确性。这项工作以希瓦氏菌属细菌为研究对象,该属细菌在生物修复和环境工程应用中具有特殊意义。了解这些物种的遗传历史对于这些应用至关重要。构建了一组成对动态规划比对,以测试多序列比对中的位置是否具有系统发育明确性,并对希瓦氏菌属11个已测序物种的蛋白质序列进行了全基因组扫描。然后将由系统发育明确的indels定义的分支用作系统发育分析的特征,并将结果与全基因组最大似然系统发育进行比较。我们发现了对该物种历史的类似描述,以及一组通过传统氨基酸替换分析无法检测到的横向基因转移候选基因。这种分析适用于所有级别的其他分类单元,并且有可能允许编目清晰的全基因组系统发育标记,用于直至物种水平的分类剖析。