Kasuga Takao, Mannhaupt Gertrud, Glass N Louise
Department of Plant and Microbial Biology, University of California, Berkeley, California, USA.
PLoS One. 2009 Apr 21;4(4):e5286. doi: 10.1371/journal.pone.0005286.
In the post-genome era, insufficient functional annotation of predicted genes greatly restricts the potential of mining genome data. We demonstrate that an evolutionary approach, which is independent of functional annotation, has great potential as a tool for genome analysis. We chose the genome of a model filamentous fungus Neurospora crassa as an example. Phylogenetic distribution of each predicted protein coding gene (PCG) in the N. crassa genome was used to classify genes into six mutually exclusive lineage specificity (LS) groups, i.e. Eukaryote/Prokaryote-core, Dikarya-core, Ascomycota-core, Pezizomycotina-specific, N. crassa-orphans and Others. Functional category analysis revealed that only approximately 23% of PCGs in the two most highly lineage-specific grouping, Pezizomycotina-specific and N. crassa-orphans, have functional annotation. In contrast, approximately 76% of PCGs in the remaining four LS groups have functional annotation. Analysis of chromosomal localization of N. crassa-orphan PCGs and genes encoding for secreted proteins showed enrichment in subtelomeric regions. The origin of N. crassa-orphans is not known. We found that 11% of N. crassa-orphans have paralogous N. crassa-orphan genes. Of the paralogous N. crassa-orphan gene pairs, 33% were tandemly located in the genome, implying a duplication origin of N. crassa-orphan PCGs in the past. LS grouping is thus a useful tool to explore and understand genome organization, evolution and gene function in fungi.
在后基因组时代,对预测基因的功能注释不足极大地限制了挖掘基因组数据的潜力。我们证明,一种独立于功能注释的进化方法作为基因组分析工具具有巨大潜力。我们选择了模式丝状真菌粗糙脉孢菌的基因组作为示例。利用粗糙脉孢菌基因组中每个预测的蛋白质编码基因(PCG)的系统发育分布,将基因分为六个相互排斥的谱系特异性(LS)组,即真核生物/原核生物核心组、双核菌核心组、子囊菌核心组、粪壳菌纲特异性组、粗糙脉孢菌孤儿组和其他组。功能类别分析表明,在两个谱系特异性最高的组,即粪壳菌纲特异性组和粗糙脉孢菌孤儿组中,只有约23%的PCG具有功能注释。相比之下,其余四个LS组中约76%的PCG具有功能注释。对粗糙脉孢菌孤儿PCG和分泌蛋白编码基因的染色体定位分析表明,它们在亚端粒区域富集。粗糙脉孢菌孤儿的起源尚不清楚。我们发现11%的粗糙脉孢菌孤儿有同源的粗糙脉孢菌孤儿基因。在同源的粗糙脉孢菌孤儿基因对中,33%在基因组中串联定位,这意味着过去粗糙脉孢菌孤儿PCG有复制起源。因此,LS分组是探索和理解真菌基因组组织、进化和基因功能的有用工具。