利用共享基因组同线性和共享蛋白质功能来加强直系同源基因对的识别。

Using shared genomic synteny and shared protein functions to enhance the identification of orthologous gene pairs.

作者信息

Zheng Xiangqun H, Lu Fu, Wang Zhen-Yuan, Zhong Fei, Hoover Jeffrey, Mural Richard

机构信息

Assays and Bioinformatics, Celera Genomics Corporation, 45 West Gude Drive, Rockville, MD 20850, USA.

出版信息

Bioinformatics. 2005 Mar;21(6):703-10. doi: 10.1093/bioinformatics/bti045. Epub 2004 Sep 30.

DOI:10.1093/bioinformatics/bti045

PMID:15458983

Abstract

MOTIVATION

The identification of orthologous gene pairs is generally based on sequence similarity. Gene pairs that are mutually 'best hits' between the genomes being compared are asserted to be orthologs. Although this method identifies most orthologous gene pairs with high confidence, it will miss a fraction of them, especially genes in duplicated gene families. In addition, the approach depends heavily on the completeness and quality of gene annotation. When the gene sequences are not correctly represented the approach is unlikely to find the correct ortholog. To overcome these limitations, we have developed an approach to identify orthologous gene pairs using shared chromosomal synteny and the annotation of protein function.

RESULTS

Assembled mouse and human genomes were used to identify the regions of conserved synteny between these genomes. 'Syntenic anchors' are conserved non-repetitive locations between mouse and human genomes. Using these anchors, we identified blocks of sequences that contain consistently ordered anchors between the two genomes (syntenic blocks). The synteny information has been used to help us identify orthologous gene pairs between mouse and human genomes. The approach combines the mutual selection of the best tBlastX hits between human and mouse transcripts, and inferring gene orthologous relationships based on sharing syntenic anchors, collocating in the same syntenic blocks and sharing the same annotated protein function. Using this approach, we were able to find 19,357 orthologous gene pairs between human and mouse genomes, a 20% increase in the number of orthologs identified by conventional approaches.

摘要

动机

直系同源基因对的鉴定通常基于序列相似性。在被比较的基因组之间相互为“最佳匹配”的基因对被认定为直系同源基因。尽管这种方法能以高置信度鉴定出大多数直系同源基因对，但仍会遗漏一部分，尤其是重复基因家族中的基因。此外，该方法严重依赖基因注释的完整性和质量。当基因序列未被正确呈现时，这种方法不太可能找到正确的直系同源基因。为克服这些局限性，我们开发了一种利用共享染色体同线性和蛋白质功能注释来鉴定直系同源基因对的方法。

结果

使用组装好的小鼠和人类基因组来鉴定这些基因组之间的保守同线性区域。“同线性锚点”是小鼠和人类基因组之间保守的非重复位置。利用这些锚点，我们鉴定出了在两个基因组之间包含一致排列锚点的序列块（同线性块）。同线性信息已被用于帮助我们鉴定小鼠和人类基因组之间的直系同源基因对。该方法结合了人类和小鼠转录本之间最佳tBlastX匹配的相互选择，以及基于共享同线性锚点、位于相同同线性块中且共享相同注释蛋白质功能来推断基因直系同源关系。使用这种方法，我们能够在人类和小鼠基因组之间找到19357对直系同源基因对，比传统方法鉴定出的直系同源基因数量增加了20%。

相似文献

Using shared genomic synteny and shared protein functions to enhance the identification of orthologous gene pairs.

Bioinformatics. 2005 Mar;21(6):703-10. doi: 10.1093/bioinformatics/bti045. Epub 2004 Sep 30.

The impact of the protein interactome on the syntenic structure of mammalian genomes.

PLoS One. 2017 Sep 14;12(9):e0179112. doi: 10.1371/journal.pone.0179112. eCollection 2017.

The UniMarker (UM) method for synteny mapping of large genomes.

Bioinformatics. 2004 Nov 22;20(17):3156-65. doi: 10.1093/bioinformatics/bth380. Epub 2004 Jun 24.

DAGchainer: a tool for mining segmental genome duplications and synteny.

Bioinformatics. 2004 Dec 12;20(18):3643-6. doi: 10.1093/bioinformatics/bth397. Epub 2004 Jul 9.

Cinteny: flexible analysis and visualization of synteny and genome rearrangements in multiple organisms.

BMC Bioinformatics. 2007 Mar 8;8:82. doi: 10.1186/1471-2105-8-82.

Clustering of main orthologs for multiple genomes.

Comput Syst Bioinformatics Conf. 2007;6:195-201.

Clustering of main orthologs for multiple genomes.

J Bioinform Comput Biol. 2008 Jun;6(3):573-84. doi: 10.1142/s0219720008003540.

Improving the specificity of high-throughput ortholog prediction.

BMC Bioinformatics. 2006 May 28;7:270. doi: 10.1186/1471-2105-7-270.

Genomic features in the breakpoint regions between syntenic blocks.

Bioinformatics. 2004 Aug 4;20 Suppl 1:i318-25. doi: 10.1093/bioinformatics/bth934.

SynBlast: assisting the analysis of conserved synteny information.

BMC Bioinformatics. 2008 Aug 24;9:351. doi: 10.1186/1471-2105-9-351.

引用本文的文献

-mediated susceptibility to plum pox virus: vascular expression in and functional validation through ortholog silencing in .

Front Plant Sci. 2025 Jun 25;16:1614211. doi: 10.3389/fpls.2025.1614211. eCollection 2025.

Genome-wide characterization and expression analysis of the gene family in response to salt and drought stress in alfalfa ().

Front Plant Sci. 2025 Jan 30;15:1520267. doi: 10.3389/fpls.2024.1520267. eCollection 2024.

Detection of colinear blocks and synteny and evolutionary analyses based on utilization of MCScanX.

Nat Protoc. 2024 Jul;19(7):2206-2229. doi: 10.1038/s41596-024-00968-2. Epub 2024 Mar 15.

Variation in the Evolution and Sequences of Proglucagon and the Receptors for Proglucagon-Derived Peptides in Mammals.

Front Endocrinol (Lausanne). 2021 Jul 12;12:700066. doi: 10.3389/fendo.2021.700066. eCollection 2021.

Genome-Wide Identification, Characterization, and Expression Profiling of the Legume Transcription Factor Gene Family.

Front Plant Sci. 2018 Sep 19;9:1332. doi: 10.3389/fpls.2018.01332. eCollection 2018.

Surveying alignment-free features for Ortholog detection in related yeast proteomes by using supervised big data classifiers.

BMC Bioinformatics. 2018 May 3;19(1):166. doi: 10.1186/s12859-018-2148-8.

An Updated Functional Annotation of Protein-Coding Genes in the Cucumber Genome.

Front Plant Sci. 2018 Mar 15;9:325. doi: 10.3389/fpls.2018.00325. eCollection 2018.

The impact of the protein interactome on the syntenic structure of mammalian genomes.

PLoS One. 2017 Sep 14;12(9):e0179112. doi: 10.1371/journal.pone.0179112. eCollection 2017.

Inferring Orthologs: Open Questions and Perspectives.

Genomics Insights. 2016 Feb 25;9:17-28. doi: 10.4137/GEI.S37925. eCollection 2016.

An Effective Big Data Supervised Imbalanced Classification Approach for Ortholog Detection in Related Yeast Species.

Biomed Res Int. 2015;2015:748681. doi: 10.1155/2015/748681. Epub 2015 Oct 29.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用共享基因组同线性和共享蛋白质功能来加强直系同源基因对的识别。

Using shared genomic synteny and shared protein functions to enhance the identification of orthologous gene pairs.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

动机

结果

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献