Suppr超能文献

利用共享基因组同线性和共享蛋白质功能来加强直系同源基因对的识别。

Using shared genomic synteny and shared protein functions to enhance the identification of orthologous gene pairs.

作者信息

Zheng Xiangqun H, Lu Fu, Wang Zhen-Yuan, Zhong Fei, Hoover Jeffrey, Mural Richard

机构信息

Assays and Bioinformatics, Celera Genomics Corporation, 45 West Gude Drive, Rockville, MD 20850, USA.

出版信息

Bioinformatics. 2005 Mar;21(6):703-10. doi: 10.1093/bioinformatics/bti045. Epub 2004 Sep 30.

Abstract

MOTIVATION

The identification of orthologous gene pairs is generally based on sequence similarity. Gene pairs that are mutually 'best hits' between the genomes being compared are asserted to be orthologs. Although this method identifies most orthologous gene pairs with high confidence, it will miss a fraction of them, especially genes in duplicated gene families. In addition, the approach depends heavily on the completeness and quality of gene annotation. When the gene sequences are not correctly represented the approach is unlikely to find the correct ortholog. To overcome these limitations, we have developed an approach to identify orthologous gene pairs using shared chromosomal synteny and the annotation of protein function.

RESULTS

Assembled mouse and human genomes were used to identify the regions of conserved synteny between these genomes. 'Syntenic anchors' are conserved non-repetitive locations between mouse and human genomes. Using these anchors, we identified blocks of sequences that contain consistently ordered anchors between the two genomes (syntenic blocks). The synteny information has been used to help us identify orthologous gene pairs between mouse and human genomes. The approach combines the mutual selection of the best tBlastX hits between human and mouse transcripts, and inferring gene orthologous relationships based on sharing syntenic anchors, collocating in the same syntenic blocks and sharing the same annotated protein function. Using this approach, we were able to find 19,357 orthologous gene pairs between human and mouse genomes, a 20% increase in the number of orthologs identified by conventional approaches.

摘要

动机

直系同源基因对的鉴定通常基于序列相似性。在被比较的基因组之间相互为“最佳匹配”的基因对被认定为直系同源基因。尽管这种方法能以高置信度鉴定出大多数直系同源基因对,但仍会遗漏一部分,尤其是重复基因家族中的基因。此外,该方法严重依赖基因注释的完整性和质量。当基因序列未被正确呈现时,这种方法不太可能找到正确的直系同源基因。为克服这些局限性,我们开发了一种利用共享染色体同线性和蛋白质功能注释来鉴定直系同源基因对的方法。

结果

使用组装好的小鼠和人类基因组来鉴定这些基因组之间的保守同线性区域。“同线性锚点”是小鼠和人类基因组之间保守的非重复位置。利用这些锚点,我们鉴定出了在两个基因组之间包含一致排列锚点的序列块(同线性块)。同线性信息已被用于帮助我们鉴定小鼠和人类基因组之间的直系同源基因对。该方法结合了人类和小鼠转录本之间最佳tBlastX匹配的相互选择,以及基于共享同线性锚点、位于相同同线性块中且共享相同注释蛋白质功能来推断基因直系同源关系。使用这种方法,我们能够在人类和小鼠基因组之间找到19357对直系同源基因对,比传统方法鉴定出的直系同源基因数量增加了20%。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验