Penel Simon, Arigon Anne-Muriel, Dufayard Jean-François, Sertier Anne-Sophie, Daubin Vincent, Duret Laurent, Gouy Manolo, Perrière Guy
Laboratoire de Biométrie et Biologie Evolutive, CNRS, Université Claude Bernard - Lyon 1, 43 bd, du 11 Novembre 1918, 69622 Villeurbanne Cedex, France.
BMC Bioinformatics. 2009 Jun 16;10 Suppl 6(Suppl 6):S3. doi: 10.1186/1471-2105-10-S6-S3.
Comparative genomics is a central step in many sequence analysis studies, from gene annotation and the identification of new functional regions in genomes, to the study of evolutionary processes at the molecular level (speciation, single gene or whole genome duplications, etc.) and phylogenetics. In that context, databases providing users high quality homologous families and sequence alignments as well as phylogenetic trees based on state of the art algorithms are becoming indispensable.
We developed an automated procedure allowing massive all-against-all similarity searches, gene clustering, multiple alignments computation, and phylogenetic trees construction and reconciliation. The application of this procedure to a very large set of sequences is possible through parallel computing on a large computer cluster.
Three databases were developed using this procedure: HOVERGEN, HOGENOM and HOMOLENS. These databases share the same architecture but differ in their content. HOVERGEN contains sequences from vertebrates, HOGENOM is mainly devoted to completely sequenced microbial organisms, and HOMOLENS is devoted to metazoan genomes from Ensembl. Access to the databases is provided through Web query forms, a general retrieval system and a client-server graphical interface. The later can be used to perform tree-pattern based searches allowing, among other uses, to retrieve sets of orthologous genes. The three databases, as well as the software required to build and query them, can be used or downloaded from the PBIL (Pôle Bioinformatique Lyonnais) site at http://pbil.univ-lyon1.fr/.
比较基因组学是许多序列分析研究的核心步骤,从基因注释、基因组中新功能区域的鉴定,到分子水平上进化过程的研究(物种形成、单基因或全基因组重复等)以及系统发育学。在这种情况下,能够为用户提供高质量同源家族、序列比对以及基于先进算法的系统发育树的数据库正变得不可或缺。
我们开发了一种自动化程序,可进行大规模的全对全相似性搜索、基因聚类、多序列比对计算以及系统发育树的构建与整合。通过在大型计算机集群上进行并行计算,可以将此程序应用于非常大的序列集。
使用该程序开发了三个数据库:HOVERGEN、HOGENOM和HOMOLENS。这些数据库具有相同的架构,但内容有所不同。HOVERGEN包含脊椎动物的序列,HOGENOM主要专注于已完全测序的微生物,而HOMOLENS专注于来自Ensembl中的后生动物基因组。可通过网页查询表单、通用检索系统和客户端 - 服务器图形界面访问这些数据库。后者可用于执行基于树模式的搜索,除其他用途外,还可检索直系同源基因集。这三个数据库以及构建和查询它们所需的软件均可从PBIL(里昂生物信息学中心)网站http://pbil.univ-lyon1.fr/使用或下载。