Zhu Wei, Buell C Robin
The Institute for Genomic Research, Rockville, Maryland 20850, USA.
Genome Res. 2007 Mar;17(3):299-310. doi: 10.1101/gr.5881807. Epub 2007 Feb 6.
Rice is an important model species for the Poaceae and other monocotyledonous plants. With the availability of a near-complete, finished, and annotated rice genome, we performed genome level comparisons between rice and all plant species in which large genomic or transcriptomic data sets are available to determine the utility of cross-species sequence for structural and functional annotation of the rice genome. Through comparative analyses with four plant genome sequence data sets and transcript assemblies from 185 plant species, we were able to confirm and improve the structural annotation of the rice genome. Support for 38,109 (89.3%) of the total 42,653 nontransposable element-related genes in the rice genome in the form of a rice expressed sequence tag, full-length cDNA, or plant homolog from our comparative analyses could be found. Although the majority of the putative homologs were obtained from Poaceae species, putative homologs were identified in dicotyledonous angiosperms, gymnosperms, and other plants such as algae, moss, and fern. A set of rice genes (7669) lacking a putative homolog was identified which may be lineage-specific genes that evolved after speciation and have a role in species diversity. Improvements to the current rice gene structural annotation could be identified from our comparative alignments and we were able to identify 487 genes which were mostly likely missed in the current rice genome annotation and another 500 genes for structural annotation review. We were able to demonstrate the utility of cross-species comparative alignments in the identification of noncoding sequences and in confirmation of gene nesting in rice.
水稻是禾本科及其他单子叶植物的重要模式物种。随着近乎完整、完成注释的水稻基因组的可得,我们对水稻与所有具备大型基因组或转录组数据集的植物物种进行了基因组层面的比较,以确定跨物种序列对水稻基因组结构和功能注释的效用。通过与四个植物基因组序列数据集以及来自185个植物物种的转录本组装进行比较分析,我们得以确认并改进水稻基因组的结构注释。在我们的比较分析中,能找到以水稻表达序列标签、全长cDNA或植物同源物形式对水稻基因组中42,653个非转座元件相关基因里38,109个(89.3%)的支持。尽管大多数推定的同源物是从禾本科物种获得的,但在双子叶被子植物、裸子植物以及其他植物如藻类、苔藓和蕨类中也鉴定出了推定的同源物。鉴定出了一组缺乏推定同源物的水稻基因(7669个),它们可能是物种形成后进化而来的谱系特异性基因,在物种多样性中发挥作用。从我们的比较比对中可以确定对当前水稻基因结构注释的改进,并且我们能够鉴定出487个在当前水稻基因组注释中很可能被遗漏的基因以及另外500个用于结构注释审查的基因。我们能够证明跨物种比较比对在鉴定水稻非编码序列和确认基因嵌套方面的效用。