用于比较基因组学的同源基因家族数据库。

Databases of homologous gene families for comparative genomics.

作者信息

Penel Simon, Arigon Anne-Muriel, Dufayard Jean-François, Sertier Anne-Sophie, Daubin Vincent, Duret Laurent, Gouy Manolo, Perrière Guy

机构信息

Laboratoire de Biométrie et Biologie Evolutive, CNRS, Université Claude Bernard - Lyon 1, 43 bd, du 11 Novembre 1918, 69622 Villeurbanne Cedex, France.

出版信息

BMC Bioinformatics. 2009 Jun 16;10 Suppl 6(Suppl 6):S3. doi: 10.1186/1471-2105-10-S6-S3.

DOI:10.1186/1471-2105-10-S6-S3

PMID:19534752

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2697650/

Abstract

BACKGROUND

Comparative genomics is a central step in many sequence analysis studies, from gene annotation and the identification of new functional regions in genomes, to the study of evolutionary processes at the molecular level (speciation, single gene or whole genome duplications, etc.) and phylogenetics. In that context, databases providing users high quality homologous families and sequence alignments as well as phylogenetic trees based on state of the art algorithms are becoming indispensable.

METHODS

We developed an automated procedure allowing massive all-against-all similarity searches, gene clustering, multiple alignments computation, and phylogenetic trees construction and reconciliation. The application of this procedure to a very large set of sequences is possible through parallel computing on a large computer cluster.

RESULTS

Three databases were developed using this procedure: HOVERGEN, HOGENOM and HOMOLENS. These databases share the same architecture but differ in their content. HOVERGEN contains sequences from vertebrates, HOGENOM is mainly devoted to completely sequenced microbial organisms, and HOMOLENS is devoted to metazoan genomes from Ensembl. Access to the databases is provided through Web query forms, a general retrieval system and a client-server graphical interface. The later can be used to perform tree-pattern based searches allowing, among other uses, to retrieve sets of orthologous genes. The three databases, as well as the software required to build and query them, can be used or downloaded from the PBIL (Pôle Bioinformatique Lyonnais) site at http://pbil.univ-lyon1.fr/.

摘要

背景

比较基因组学是许多序列分析研究的核心步骤，从基因注释、基因组中新功能区域的鉴定，到分子水平上进化过程的研究（物种形成、单基因或全基因组重复等）以及系统发育学。在这种情况下，能够为用户提供高质量同源家族、序列比对以及基于先进算法的系统发育树的数据库正变得不可或缺。

方法

我们开发了一种自动化程序，可进行大规模的全对全相似性搜索、基因聚类、多序列比对计算以及系统发育树的构建与整合。通过在大型计算机集群上进行并行计算，可以将此程序应用于非常大的序列集。

结果

使用该程序开发了三个数据库：HOVERGEN、HOGENOM和HOMOLENS。这些数据库具有相同的架构，但内容有所不同。HOVERGEN包含脊椎动物的序列，HOGENOM主要专注于已完全测序的微生物，而HOMOLENS专注于来自Ensembl中的后生动物基因组。可通过网页查询表单、通用检索系统和客户端 - 服务器图形界面访问这些数据库。后者可用于执行基于树模式的搜索，除其他用途外，还可检索直系同源基因集。这三个数据库以及构建和查询它们所需的软件均可从PBIL（里昂生物信息学中心）网站http://pbil.univ-lyon1.fr/使用或下载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf90/2697650/6706f0031258/1471-2105-10-S6-S3-1.jpg

相似文献

Databases of homologous gene families for comparative genomics.用于比较基因组学的同源基因家族数据库。

BMC Bioinformatics. 2009 Jun 16;10 Suppl 6(Suppl 6):S3. doi: 10.1186/1471-2105-10-S6-S3.

Integrated databanks access and sequence/structure analysis services at the PBIL.法国国家信息与自动化研究所的综合数据库访问以及序列/结构分析服务

Nucleic Acids Res. 2003 Jul 1;31(13):3393-9. doi: 10.1093/nar/gkg530.

Remote access to ACNUC nucleotide and protein sequence databases at PBIL.远程访问PBIL的ACNUC核苷酸和蛋白质序列数据库。

Biochimie. 2008 Apr;90(4):555-62. doi: 10.1016/j.biochi.2007.07.003. Epub 2007 Jul 15.

HOBACGEN: database system for comparative genomics in bacteria.HOBACGEN：用于细菌比较基因组学的数据库系统。

Genome Res. 2000 Mar;10(3):379-85. doi: 10.1101/gr.10.3.379.

HoSeqI: automated homologous sequence identification in gene family databases.HoSeqI：基因家族数据库中的自动同源序列识别

Bioinformatics. 2006 Jul 15;22(14):1786-7. doi: 10.1093/bioinformatics/btl179. Epub 2006 May 8.

Tree pattern matching in phylogenetic trees: automatic search for orthologs or paralogs in homologous gene sequence databases.系统发育树中的树形模式匹配：在同源基因序列数据库中自动搜索直系同源基因或旁系同源基因。

Bioinformatics. 2005 Jun 1;21(11):2596-603. doi: 10.1093/bioinformatics/bti325. Epub 2005 Feb 15.

Polymorphix: a sequence polymorphism database.多态性数据库：一个序列多态性数据库。

Nucleic Acids Res. 2005 Jan 1;33(Database issue):D481-4. doi: 10.1093/nar/gki076.

ITEP: an integrated toolkit for exploration of microbial pan-genomes.ITEP：用于探索微生物泛基因组的集成工具包。

BMC Genomics. 2014 Jan 3;15:8. doi: 10.1186/1471-2164-15-8.

Automatic identification of large collections of protein-coding or rRNA sequences.自动识别大量蛋白质编码或rRNA序列。

Biochimie. 2008 Apr;90(4):609-14. doi: 10.1016/j.biochi.2007.08.006. Epub 2007 Sep 2.

Ensembl comparative genomics resources.Ensembl比较基因组学资源。

Database (Oxford). 2016 Feb 20;2016. doi: 10.1093/database/bav096. Print 2016.

引用本文的文献

Detection of and the Genotypes of Resistance to Clarithromycin, Fluoroquinolones, and Metronidazole in Gastric Biopsies: An In Silico Analysis to Help Understand Antibiotic Resistance.胃活检组织中克拉霉素、氟喹诺酮类和甲硝唑耐药性的检测及基因型分析：一项有助于理解抗生素耐药性的计算机模拟分析

Curr Issues Mol Biol. 2025 Mar 13;47(3):187. doi: 10.3390/cimb47030187.

Phyloformer: Fast, Accurate, and Versatile Phylogenetic Reconstruction with Deep Neural Networks.Phyloformer：使用深度神经网络进行快速、准确且通用的系统发育重建。

Mol Biol Evol. 2025 Apr 1;42(4). doi: 10.1093/molbev/msaf051.

Revisiting the druggable genome using predicted structures and data mining.利用预测结构和数据挖掘技术重新审视可药物基因组。

NPJ Drug Discov. 2025;2(1):3. doi: 10.1038/s44386-025-00006-5. Epub 2025 Mar 4.

Revisiting the Plasmodium falciparum druggable genome using predicted structures and data mining.利用预测结构和数据挖掘技术重新审视恶性疟原虫的可药物基因组。

Res Sq. 2024 Nov 26:rs.3.rs-5412515. doi: 10.21203/rs.3.rs-5412515/v1.

AleRax: a tool for gene and species tree co-estimation and reconciliation under a probabilistic model of gene duplication, transfer, and loss.AleRax：一种在基因复制、转移和丢失的概率模型下，进行基因和物种树共同估计和协调的工具。

Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae162.

MCSdb, a database of proteins residing in membrane contact sites.MCSdb，一个位于膜接触位点的蛋白质数据库。

Sci Data. 2024 Mar 8;11(1):281. doi: 10.1038/s41597-024-03104-7.

PyKleeBarcode: Enabling representation of the whole animal kingdom in information space.PyKleeBarcode：实现整个动物界在信息空间中的表示。

PLoS One. 2023 Jun 2;18(6):e0286314. doi: 10.1371/journal.pone.0286314. eCollection 2023.

From prediction to function: Current practices and challenges towards the functional characterization of type III effectors.从预测到功能：III型效应蛋白功能表征的当前实践与挑战

Front Microbiol. 2023 Feb 8;14:1113442. doi: 10.3389/fmicb.2023.1113442. eCollection 2023.

Genomic Insights of -Infective Strains Reveal Unique Genetic Features and New Evidence on Their Host-Restricted Lifestyle.-感染性菌株的基因组分析揭示了其独特的遗传特征和宿主限制生活方式的新证据。

Genes (Basel). 2023 Feb 20;14(2):530. doi: 10.3390/genes14020530.

A Pan-Cancer Analysis of the Oncogenic Role of WD Repeat Domain 74 in Multiple Tumors.WD重复结构域74在多种肿瘤中的致癌作用的泛癌分析

Front Genet. 2022 Apr 26;13:860940. doi: 10.3389/fgene.2022.860940. eCollection 2022.

本文引用的文献

Petabyte-scale innovations at the European Nucleotide Archive.欧洲核苷酸档案库的千万亿字节级创新。

Nucleic Acids Res. 2009 Jan;37(Database issue):D19-25. doi: 10.1093/nar/gkn765. Epub 2008 Oct 31.

GenBank.基因银行

Nucleic Acids Res. 2009 Jan;37(Database issue):D26-31. doi: 10.1093/nar/gkn723. Epub 2008 Oct 21.

The Universal Protein Resource (UniProt) 2009.通用蛋白质资源（UniProt）2009 版

Nucleic Acids Res. 2009 Jan;37(Database issue):D169-74. doi: 10.1093/nar/gkn664. Epub 2008 Oct 4.

Detecting lateral genetic transfer : a phylogenetic approach.检测横向基因转移：一种系统发育方法。

Methods Mol Biol. 2008;452:457-69. doi: 10.1007/978-1-60327-159-2_21.

Pervasive positive selection on duplicated and nonduplicated vertebrate protein coding genes.对重复和非重复脊椎动物蛋白质编码基因的普遍正向选择。

Genome Res. 2008 Sep;18(9):1393-402. doi: 10.1101/gr.076992.108. Epub 2008 Jun 18.

The EMBL Nucleotide Sequence and Genome Reviews Databases.欧洲分子生物学实验室核苷酸序列与基因组综述数据库。

Methods Mol Biol. 2007;406:1-21. doi: 10.1007/978-1-59745-535-0_1.

TreeFam: 2008 Update.树家族：2008年更新版

Nucleic Acids Res. 2008 Jan;36(Database issue):D735-40. doi: 10.1093/nar/gkm1005. Epub 2007 Dec 1.

InParanoid 6: eukaryotic ortholog clusters with inparalogs.InParanoid 6：含旁系同源基因的真核直系同源簇

Nucleic Acids Res. 2008 Jan;36(Database issue):D263-6. doi: 10.1093/nar/gkm1020. Epub 2007 Nov 30.

Ensembl 2008.Ensembl 2008。

Nucleic Acids Res. 2008 Jan;36(Database issue):D707-14. doi: 10.1093/nar/gkm988. Epub 2007 Nov 13.

Remote access to ACNUC nucleotide and protein sequence databases at PBIL.远程访问PBIL的ACNUC核苷酸和蛋白质序列数据库。

Biochimie. 2008 Apr;90(4):555-62. doi: 10.1016/j.biochi.2007.07.003. Epub 2007 Jul 15.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于比较基因组学的同源基因家族数据库。

Databases of homologous gene families for comparative genomics.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

背景

方法

结果

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献