Frickey Tancred, Lupas Andrei N
Department of Protein Evolution, Max-Planck-Institute for Developmental Biology, Spemannstr. 35, D-72076 Tuebingen, Germany.
Nucleic Acids Res. 2004 Sep 30;32(17):5231-8. doi: 10.1093/nar/gkh867. Print 2004.
Phylogenetic reconstruction is the method of choice to determine the homologous relationships between sequences. Difficulties in producing high-quality alignments, which are the basis of good trees, and in automating the analysis of trees have unfortunately limited the use of phylogenetic reconstruction methods to individual genes or gene families. Due to the large number of sequences involved, phylogenetic analyses of proteomes preclude manual steps and therefore require a high degree of automation in sequence selection, alignment, phylogenetic inference and analysis of the resulting set of trees. We present a set of programs that automates the steps from seed sequence to phylogeny and a utility to extract all phylogenies that match specific topological constraints from a database of trees. Two example applications that show the type of questions that can be answered by phylome analysis are provided. The generation and analysis of the Thermoplasma acidophilum phylome with regard to lateral gene transfer between Thermoplasmata and Sulfolobus, showed best BLAST hits to be far less reliable indicators of lateral transfer than the corresponding protein phylogenies. The generation and analysis of the Danio rerio phylome provided more than twice as many proteins as described previously, supporting the hypothesis of an additional round of genome duplication in the actinopterygian lineage.
系统发育重建是确定序列间同源关系的首选方法。生成高质量比对(这是构建良好树的基础)以及使树的分析自动化存在困难,不幸的是,这限制了系统发育重建方法仅用于单个基因或基因家族。由于涉及的序列数量众多,蛋白质组的系统发育分析排除了人工步骤,因此在序列选择、比对、系统发育推断以及对所得树集的分析方面需要高度自动化。我们展示了一组程序,可将从种子序列到系统发育的步骤自动化,还展示了一个实用工具,可从树数据库中提取所有符合特定拓扑约束的系统发育。提供了两个示例应用,展示了通过系统发育组分析可以回答的问题类型。对嗜热栖热菌系统发育组进行关于嗜热栖热菌和硫化叶菌之间水平基因转移的生成和分析,结果表明,与相应的蛋白质系统发育相比,最佳BLAST比对结果作为水平转移指标的可靠性要低得多。斑马鱼系统发育组的生成和分析所提供的蛋白质数量是之前描述数量的两倍多,支持了辐鳍鱼系中存在额外一轮基因组复制的假说。