Suppr超能文献

通过成对物种比较对直系同源基因和旁系同源基因进行自动聚类。

Automatic clustering of orthologs and in-paralogs from pairwise species comparisons.

作者信息

Remm M, Storm C E, Sonnhammer E L

机构信息

Center for Genomics and Bioinformatics, Karolinska Institutet, S-17177 Stockholm, Sweden.

出版信息

J Mol Biol. 2001 Dec 14;314(5):1041-52. doi: 10.1006/jmbi.2000.5197.

Abstract

Orthologs are genes in different species that originate from a single gene in the last common ancestor of these species. Such genes have often retained identical biological roles in the present-day organisms. It is hence important to identify orthologs for transferring functional information between genes in different organisms with a high degree of reliability. For example, orthologs of human proteins are often functionally characterized in model organisms. Unfortunately, orthology analysis between human and e.g. invertebrates is often complex because of large numbers of paralogs within protein families. Paralogs that predate the species split, which we call out-paralogs, can easily be confused with true orthologs. Paralogs that arose after the species split, which we call in-paralogs, however, are bona fide orthologs by definition. Orthologs and in-paralogs are typically detected with phylogenetic methods, but these are slow and difficult to automate. Automatic clustering methods based on two-way best genome-wide matches on the other hand, have so far not separated in-paralogs from out-paralogs effectively. We present a fully automatic method for finding orthologs and in-paralogs from two species. Ortholog clusters are seeded with a two-way best pairwise match, after which an algorithm for adding in-paralogs is applied. The method bypasses multiple alignments and phylogenetic trees, which can be slow and error-prone steps in classical ortholog detection. Still, it robustly detects complex orthologous relationships and assigns confidence values for both orthologs and in-paralogs. The program, called INPARANOID, was tested on all completely sequenced eukaryotic genomes. To assess the quality of INPARANOID results, ortholog clusters were generated from a dataset of worm and mammalian transmembrane proteins, and were compared to clusters derived by manual tree-based ortholog detection methods. This study led to the identification with a high degree of confidence of over a dozen novel worm-mammalian ortholog assignments that were previously undetected because of shortcomings of phylogenetic methods.A WWW server that allows searching for orthologs between human and several fully sequenced genomes is installed at http://www.cgb.ki.se/inparanoid/. This is the first comprehensive resource with orthologs of all fully sequenced eukaryotic genomes. Programs and tables of orthology assignments are available from the same location.

摘要

直系同源基因是不同物种中的基因,它们起源于这些物种最近共同祖先中的单个基因。这类基因在当今生物体中常常保留着相同的生物学功能。因此,识别直系同源基因对于在不同生物体的基因之间高度可靠地传递功能信息非常重要。例如,人类蛋白质的直系同源基因通常在模式生物中进行功能表征。不幸的是,由于蛋白质家族中存在大量旁系同源基因,人与例如无脊椎动物之间的直系同源分析往往很复杂。在物种分化之前出现的旁系同源基因,我们称之为外旁系同源基因,很容易与真正的直系同源基因混淆。然而,在物种分化之后出现的旁系同源基因,我们称之为内旁系同源基因,根据定义它们是真正的直系同源基因。直系同源基因和内旁系同源基因通常用系统发育方法来检测,但这些方法速度慢且难以自动化。另一方面,基于全基因组双向最佳匹配的自动聚类方法到目前为止还没有有效地将内旁系同源基因与外旁系同源基因区分开来。我们提出了一种从两个物种中寻找直系同源基因和内旁系同源基因的全自动方法。直系同源基因簇以双向最佳成对匹配作为种子,之后应用一种添加内旁系同源基因的算法。该方法绕过了多序列比对和系统发育树构建,而这在传统的直系同源基因检测中可能是缓慢且容易出错的步骤。尽管如此,它仍能可靠地检测复杂的直系同源关系,并为直系同源基因和内旁系同源基因赋予置信值。这个名为INPARANOID的程序在所有已完全测序的真核生物基因组上进行了测试。为了评估INPARANOID结果的质量,从一组蠕虫和哺乳动物跨膜蛋白数据集中生成了直系同源基因簇,并与通过基于手动构建系统发育树的直系同源基因检测方法得出的簇进行了比较。这项研究使得人们能够高度自信地识别出十几个新的蠕虫 - 哺乳动物直系同源基因配对,这些配对由于系统发育方法的缺陷而此前未被发现。一个允许在人类与几个已完全测序的基因组之间搜索直系同源基因的万维网服务器安装在http://www.cgb.ki.se/inparanoid/ 。这是第一个包含所有已完全测序真核生物基因组直系同源基因的综合资源。直系同源基因配对的程序和表格可从同一位置获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验