Suppr超能文献

OrthoMCL:真核生物基因组直系同源组的鉴定

OrthoMCL: identification of ortholog groups for eukaryotic genomes.

作者信息

Li Li, Stoeckert Christian J, Roos David S

机构信息

Department of Biology and Genetics, Center for Bioinformatics, and Genomics Institute, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA.

出版信息

Genome Res. 2003 Sep;13(9):2178-89. doi: 10.1101/gr.1224503.

Abstract

The identification of orthologous groups is useful for genome annotation, studies on gene/protein evolution, comparative genomics, and the identification of taxonomically restricted sequences. Methods successfully exploited for prokaryotic genome analysis have proved difficult to apply to eukaryotes, however, as larger genomes may contain multiple paralogous genes, and sequence information is often incomplete. OrthoMCL provides a scalable method for constructing orthologous groups across multiple eukaryotic taxa, using a Markov Cluster algorithm to group (putative) orthologs and paralogs. This method performs similarly to the INPARANOID algorithm when applied to two genomes, but can be extended to cluster orthologs from multiple species. OrthoMCL clusters are coherent with groups identified by EGO, but improved recognition of "recent" paralogs permits overlapping EGO groups representing the same gene to be merged. Comparison with previously assigned EC annotations suggests a high degree of reliability, implying utility for automated eukaryotic genome annotation. OrthoMCL has been applied to the proteome data set from seven publicly available genomes (human, fly, worm, yeast, Arabidopsis, the malaria parasite Plasmodium falciparum, and Escherichia coli). A Web interface allows queries based on individual genes or user-defined phylogenetic patterns (http://www.cbil.upenn.edu/gene-family). Analysis of clusters incorporating P. falciparum genes identifies numerous enzymes that were incompletely annotated in first-pass annotation of the parasite genome.

摘要

直系同源组的识别对于基因组注释、基因/蛋白质进化研究、比较基因组学以及分类学受限序列的识别都很有用。然而,已成功用于原核生物基因组分析的方法很难应用于真核生物,因为更大的基因组可能包含多个旁系同源基因,而且序列信息往往不完整。OrthoMCL提供了一种可扩展的方法,用于构建多个真核生物分类群中的直系同源组,它使用马尔可夫聚类算法对(假定的)直系同源物和旁系同源物进行分组。该方法应用于两个基因组时,其性能与INPARANOID算法相似,但可以扩展到对多个物种的直系同源物进行聚类。OrthoMCL聚类与EGO识别的组一致,但对“近期”旁系同源物的更好识别允许合并代表同一基因的重叠EGO组。与先前指定的酶委员会(EC)注释进行比较表明其具有高度可靠性,这意味着它可用于自动真核生物基因组注释。OrthoMCL已应用于来自七个公开可用基因组(人类、果蝇、线虫、酵母、拟南芥、疟原虫恶性疟原虫和大肠杆菌)的蛋白质组数据集。一个网络界面允许基于单个基因或用户定义的系统发育模式进行查询(http://www.cbil.upenn.edu/gene-family)。对包含恶性疟原虫基因的聚类分析识别出许多在该寄生虫基因组的首次注释中注释不完整的酶。

相似文献

6
Inparanoid: a comprehensive database of eukaryotic orthologs.Inparanoid:真核生物直系同源基因综合数据库。
Nucleic Acids Res. 2005 Jan 1;33(Database issue):D476-80. doi: 10.1093/nar/gki107.
8
Genomic gene clustering analysis of pathways in eukaryotes.真核生物中通路的基因组基因聚类分析
Genome Res. 2003 May;13(5):875-82. doi: 10.1101/gr.737703. Epub 2003 Apr 14.
9
BLASTO: a tool for searching orthologous groups.BLASTO:一种用于搜索直系同源组的工具。
Nucleic Acids Res. 2007 Jul;35(Web Server issue):W678-82. doi: 10.1093/nar/gkm278. Epub 2007 May 5.

引用本文的文献

2
Genomic Analysis of Genomes at High Altitude.高海拔地区基因组的基因组分析。
J Fungi (Basel). 2025 Aug 14;11(8):592. doi: 10.3390/jof11080592.

本文引用的文献

4
The Plasmodium genome database.疟原虫基因组数据库。
Nature. 2002 Oct 3;419(6906):490-2. doi: 10.1038/419490a.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验