Istituto di Zootecnica, Università Cattolica del Sacro Cuore, 29100 Piacenza, Italy.
BMC Genomics. 2009 Dec 14;10:604. doi: 10.1186/1471-2164-10-604.
With the rapid growth in the availability of genome sequence data, the automated identification of orthologous genes between species (orthologs) is of fundamental importance to facilitate functional annotation and studies on comparative and evolutionary genomics. Genes with no apparent orthologs between the bovine and human genome may be responsible for major differences between the species, however, such genes are often neglected in functional genomics studies.
A BLAST-based method was exploited to explore the current annotation and orthology predictions in Ensembl. Genes with no orthologs between the two genomes were classified into groups based on alignments, ontology, manual curation and publicly available information. Starting from a high quality and specific set of orthology predictions, as provided by Ensembl, hidden relationship between genes and genomes of different mammalian species were unveiled using a highly sensitive approach, based on sequence similarity and genomic comparison.
The analysis identified 3,801 bovine genes with no orthologs in human and 1010 human genes with no orthologs in cow, among which 411 and 43 genes, respectively, had no match at all in the other species. Most of the apparently non-orthologous genes may potentially have orthologs which were missed in the annotation process, despite having a high percentage of identity, because of differences in gene length and structure. The comparative analysis reported here identified gene variants, new genes and species-specific features and gave an overview of the other side of orthology which may help to improve the annotation of the bovine genome and the knowledge of structural differences between species.
随着基因组序列数据可用性的快速增长,自动识别物种之间的同源基因(orthologs)对于促进功能注释和比较及进化基因组学研究至关重要。牛和人类基因组之间没有明显同源基因的基因可能是物种之间存在重大差异的原因,但这些基因在功能基因组学研究中往往被忽视。
利用基于 BLAST 的方法探索 Ensembl 中的当前注释和同源预测。根据比对、本体论、人工注释和公开信息,将两个基因组之间没有同源基因的基因分为几组。从 Ensembl 提供的高质量和特定的同源预测集开始,使用基于序列相似性和基因组比较的高度敏感方法揭示了不同哺乳动物物种的基因和基因组之间的隐藏关系。
分析确定了 3801 个在人类中没有同源基因的牛基因和 1010 个在牛中没有同源基因的人类基因,其中分别有 411 和 43 个基因在其他物种中根本没有匹配。尽管同源性很高,但大多数明显的非同源基因可能有同源基因在注释过程中被遗漏,这是由于基因长度和结构的差异所致。这里报告的比较分析确定了基因变体、新基因和物种特异性特征,并概述了同源性的另一面,这可能有助于改进牛基因组的注释和物种间结构差异的知识。