使用HOPS数据库对直系同源蛋白结构域进行综合分析。

Comprehensive analysis of orthologous protein domains using the HOPS database.

作者信息

Storm Christian E V, Sonnhammer Erik L L

机构信息

Center for Genomics and Bioinformatics, Karolinska Institutet, S-17177 Stockholm, Sweden.

出版信息

Genome Res. 2003 Oct;13(10):2353-62. doi: 10.1101/gr1305203.

DOI:10.1101/gr1305203

PMID:14525933

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC403726/

Abstract

One of the most reliable methods for protein function annotation is to transfer experimentally known functions from orthologous proteins in other organisms. Most methods for identifying orthologs operate on a subset of organisms with a completely sequenced genome, and treat proteins as single-domain units. However, it is well known that proteins are often made up of several independent domains, and there is a wealth of protein sequences from genomes that are not completely sequenced. A comprehensive set of protein domain families is found in the Pfam database. We wanted to apply orthology detection to Pfam families, but first some issues needed to be addressed. First, orthology detection becomes impractical and unreliable when too many species are included. Second, shorter domains contain less information. It is therefore important to assess the quality of the orthology assignment and avoid very short domains altogether. We present a database of orthologous protein domains in Pfam called HOPS: Hierarchical grouping of Orthologous and Paralogous Sequences. Orthology is inferred in a hierarchic system of phylogenetic subgroups using ortholog bootstrapping. To avoid the frequent errors stemming from horizontally transferred genes in bacteria, the analysis is presently limited to eukaryotic genes. The results are accessible in the graphical browser NIFAS, a Java tool originally developed for analyzing phylogenetic relations within Pfam families. The method was tested on a set of curated orthologs with experimentally verified function. In comparison to tree reconciliation with a complete species tree, our approach finds significantly more orthologs in the test set. Examples for investigating gene fusions and domain recombination using HOPS are given.

摘要

蛋白质功能注释最可靠的方法之一是从其他生物体中的直系同源蛋白质转移实验已知的功能。大多数用于鉴定直系同源物的方法作用于具有完全测序基因组的生物体子集，并将蛋白质视为单结构域单元。然而，众所周知蛋白质通常由几个独立的结构域组成，并且存在来自未完全测序基因组的大量蛋白质序列。在Pfam数据库中发现了一套全面的蛋白质结构域家族。我们想将直系同源性检测应用于Pfam家族，但首先需要解决一些问题。首先，当包含太多物种时，直系同源性检测变得不切实际且不可靠。其次，较短的结构域包含的信息较少。因此，评估直系同源性分配的质量并完全避免非常短的结构域很重要。我们提出了一个名为HOPS的Pfam直系同源蛋白质结构域数据库：直系同源和旁系同源序列的层次分组。使用直系同源物自展法在系统发育亚组的层次系统中推断直系同源性。为了避免细菌中水平转移基因引起的频繁错误，目前的分析仅限于真核基因。结果可在图形浏览器NIFAS中获取，NIFAS是一个最初开发用于分析Pfam家族内系统发育关系的Java工具。该方法在一组具有经实验验证功能的精选直系同源物上进行了测试。与使用完整物种树的树调和相比，我们的方法在测试集中发现了明显更多的直系同源物。给出了使用HOPS研究基因融合和结构域重组的示例。

相似文献

Comprehensive analysis of orthologous protein domains using the HOPS database.使用HOPS数据库对直系同源蛋白结构域进行综合分析。

Genome Res. 2003 Oct;13(10):2353-62. doi: 10.1101/gr1305203.

[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].[通过新型人类基因的电子克隆和实验验证对NCBI人类基因数据库中出现的模型参考序列的一些错误进行分析、鉴定和校正]

Yi Chuan Xue Bao. 2004 May;31(5):431-43.

Domain combinations in archaeal, eubacterial and eukaryotic proteomes.古菌、真细菌和真核生物蛋白质组中的结构域组合

J Mol Biol. 2001 Jul 6;310(2):311-25. doi: 10.1006/jmbi.2001.4776.

Quantification of the elevated rate of domain rearrangements in metazoa.后生动物中结构域重排升高率的量化

J Mol Biol. 2007 Oct 5;372(5):1337-48. doi: 10.1016/j.jmb.2007.06.022. Epub 2007 Jun 15.

A database of phylogenetically atypical genes in archaeal and bacterial genomes, identified using the DarkHorse algorithm.一个使用黑马算法识别出的古菌和细菌基因组中系统发育非典型基因的数据库。

BMC Bioinformatics. 2008 Oct 7;9:419. doi: 10.1186/1471-2105-9-419.

Function-dependent clustering of orthologues and paralogues of cyclophilins.亲环蛋白直系同源物和旁系同源物的功能依赖性聚类

Proteins. 2004 Sep 1;56(4):808-20. doi: 10.1002/prot.20156.

Swaps in protein sequences.蛋白质序列中的交换。

Proteins. 2002 Aug 1;48(2):377-87. doi: 10.1002/prot.10156.

Hierarchical clustering algorithm for comprehensive orthologous-domain classification in multiple genomes.用于多基因组中综合直系同源域分类的层次聚类算法。

Nucleic Acids Res. 2006 Jan 25;34(2):647-58. doi: 10.1093/nar/gkj448. Print 2006.

Detecting non-orthology in the COGs database and other approaches grouping orthologs using genome-specific best hits.在直系同源基因簇（COGs）数据库中检测非直系同源关系，以及使用特定基因组最佳匹配来对直系同源基因进行分组的其他方法。

Nucleic Acids Res. 2006 Jul 11;34(11):3309-16. doi: 10.1093/nar/gkl433. Print 2006.

Automatic annotation of protein function based on family identification.基于家族识别的蛋白质功能自动注释。

Proteins. 2003 Nov 15;53(3):683-92. doi: 10.1002/prot.10449.

引用本文的文献

KinOrtho: a method for mapping human kinase orthologs across the tree of life and illuminating understudied kinases.KinOrtho：一种在生命之树中映射人类激酶直系同源物并阐明研究不足的激酶的方法。

BMC Bioinformatics. 2021 Sep 18;22(1):446. doi: 10.1186/s12859-021-04358-3.

Pan-Tetris: an interactive visualisation for Pan-genomes.泛基因组的Pan-Tetris交互式可视化工具

BMC Bioinformatics. 2015;16 Suppl 11(Suppl 11):S3. doi: 10.1186/1471-2105-16-S11-S3. Epub 2015 Aug 13.

MBGD update 2015: microbial genome database for flexible ortholog analysis utilizing a diverse set of genomic data.MBGD 2015更新：利用多样基因组数据进行灵活直系同源分析的微生物基因组数据库

Nucleic Acids Res. 2015 Jan;43(Database issue):D270-6. doi: 10.1093/nar/gku1152. Epub 2014 Nov 14.

Big data and other challenges in the quest for orthologs.大数据和其他挑战在寻找直系同源基因的过程中。

Bioinformatics. 2014 Nov 1;30(21):2993-8. doi: 10.1093/bioinformatics/btu492. Epub 2014 Jul 26.

Improvement of domain-level ortholog clustering by optimizing domain-specific sum-of-pairs score.通过优化特定域的对总和分数来改进域级直系同源聚类。

BMC Bioinformatics. 2014 May 18;15:148. doi: 10.1186/1471-2105-15-148.

Ortholog identification in the presence of domain architecture rearrangement.在存在结构域重排的情况下进行直系同源物鉴定。

Brief Bioinform. 2011 Sep;12(5):413-22. doi: 10.1093/bib/bbr036. Epub 2011 Jun 28.

Computational methods for Gene Orthology inference.基因直系同源推断的计算方法。

Brief Bioinform. 2011 Sep;12(5):379-91. doi: 10.1093/bib/bbr030. Epub 2011 Jun 19.

Signalogs: orthology-based identification of novel signaling pathway components in three metazoans.信号蛋白：三种后生动物中基于直系同源的信号通路新组分的鉴定。

PLoS One. 2011 May 3;6(5):e19240. doi: 10.1371/journal.pone.0019240.

Evaluating ortholog prediction algorithms in a yeast model clade.在酵母模型进化枝中评估直系同源预测算法。

PLoS One. 2011 Apr 13;6(4):e18755. doi: 10.1371/journal.pone.0018755.

DODO: an efficient orthologous genes assignment tool based on domain architectures. Domain based ortholog detection.DODO：一种基于结构域的高效的直系同源基因分配工具。基于结构域的直系同源检测。

BMC Bioinformatics. 2010 Oct 15;11 Suppl 7(Suppl 7):S6. doi: 10.1186/1471-2105-11-S7-S6.

本文引用的文献

Orthology, paralogy and proposed classification for paralog subtypes.直系同源、旁系同源及旁系同源亚型的拟分类

Trends Genet. 2002 Dec;18(12):619-20. doi: 10.1016/s0168-9525(02)02793-2.

OrthoGUI: graphical presentation of Orthostrapper results.OrthoGUI：Orthostrapper结果的图形化展示。

Bioinformatics. 2002 Sep;18(9):1272-3. doi: 10.1093/bioinformatics/18.9.1272.

RIO: analyzing proteomes by automated phylogenomics using resampled inference of orthologs.RIO：使用直系同源物的重采样推断通过自动化系统发育组学分析蛋白质组。

BMC Bioinformatics. 2002 May 16;3:14. doi: 10.1186/1471-2105-3-14.

Algorithms for phylogenetic footprinting.系统发育足迹分析算法。

J Comput Biol. 2002;9(2):211-23. doi: 10.1089/10665270252935421.

The evolutionary position of nematodes.线虫的进化地位。

BMC Evol Biol. 2002 Apr 8;2:7. doi: 10.1186/1471-2148-2-7.

Automated ortholog inference from phylogenetic trees and calculation of orthology reliability.基于系统发育树的自动直系同源物推断及直系同源性可靠性计算。

Bioinformatics. 2002 Jan;18(1):92-9. doi: 10.1093/bioinformatics/18.1.92.

Genomes in flux: the evolution of archaeal and proteobacterial gene content.动态基因组：古菌和变形菌基因含量的进化

Genome Res. 2002 Jan;12(1):17-25. doi: 10.1101/gr.176501.

The Pfam protein families database.Pfam蛋白质家族数据库。

Nucleic Acids Res. 2002 Jan 1;30(1):276-80. doi: 10.1093/nar/30.1.276.

Automatic clustering of orthologs and in-paralogs from pairwise species comparisons.通过成对物种比较对直系同源基因和旁系同源基因进行自动聚类。

J Mol Biol. 2001 Dec 14;314(5):1041-52. doi: 10.1006/jmbi.2000.5197.

Horizontal gene transfer in prokaryotes: quantification and classification.原核生物中的水平基因转移：定量与分类

Annu Rev Microbiol. 2001;55:709-42. doi: 10.1146/annurev.micro.55.1.709.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。