Huerta-Cepas Jaime, Capella-Gutierrez Salvador, Pryszcz Leszek P, Denisov Ivan, Kormes Diego, Marcet-Houben Marina, Gabaldón Toni
Bioinformatics and Genomics Programme, Centre de Regulació Genòmica, 08003 Barcelona, Spain.
Nucleic Acids Res. 2011 Jan;39(Database issue):D556-60. doi: 10.1093/nar/gkq1109. Epub 2010 Nov 12.
The growing availability of complete genomic sequences from diverse species has brought about the need to scale up phylogenomic analyses, including the reconstruction of large collections of phylogenetic trees. Here, we present the third version of PhylomeDB (http://phylomeDB.org), a public database for genome-wide collections of gene phylogenies (phylomes). Currently, PhylomeDB is the largest phylogenetic repository and hosts 17 phylomes, comprising 416,093 trees and 165,840 alignments. It is also a major source for phylogeny-based orthology and paralogy predictions, covering about 5 million proteins in 717 fully-sequenced genomes. For each protein-coding gene in a seed genome, the database provides original and processed alignments, phylogenetic trees derived from various methods and phylogeny-based predictions of orthology and paralogy relationships. The new version of phylomeDB has been extended with novel data access and visualization features, including the possibility of programmatic access. Available seed species include model organisms such as human, yeast, Escherichia coli or Arabidopsis thaliana, but also alternative model species such as the human pathogen Candida albicans, or the pea aphid Acyrtosiphon pisum. Finally, PhylomeDB is currently being used by several genome sequencing projects that couple the genome annotation process with the reconstruction of the corresponding phylome, a strategy that provides relevant evolutionary insights.
来自不同物种的完整基因组序列越来越容易获取,这就需要扩大系统发育基因组学分析的规模,包括重建大量的系统发育树。在此,我们展示了PhylomeDB(http://phylomeDB.org)的第三个版本,这是一个用于全基因组基因系统发育(系统发育组)集合的公共数据库。目前,PhylomeDB是最大的系统发育知识库,包含17个系统发育组,由416,093棵树和165,840个比对组成。它也是基于系统发育的直系同源和旁系同源预测的主要来源,涵盖了717个全测序基因组中的约500万个蛋白质。对于种子基因组中的每个蛋白质编码基因,该数据库提供原始和处理后的比对、从各种方法推导的系统发育树以及基于系统发育的直系同源和旁系同源关系预测。PhylomeDB的新版本扩展了新的数据访问和可视化功能,包括编程访问的可能性。可用的种子物种包括人类、酵母、大肠杆菌或拟南芥等模式生物,也包括人类病原体白色念珠菌或豌豆蚜等替代模式物种。最后,目前有几个基因组测序项目正在使用PhylomeDB,这些项目将基因组注释过程与相应系统发育组的重建相结合,这种策略提供了相关的进化见解。