Song Hongtao, Lin Kui, Hu Jinglu, Pang Erli
MOE Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, China.
Graduate School of Information, Production and Systems, Waseda University, Kitakyushu-shi, Japan.
Front Plant Sci. 2018 Mar 15;9:325. doi: 10.3389/fpls.2018.00325. eCollection 2018.
Although the cucumber reference genome and its annotation were published several years ago, the functional annotation of predicted genes, particularly protein-coding genes, still requires further improvement. In general, accurately determining orthologous relationships between genes allows for better and more robust functional assignments of predicted genes. As one of the most reliable strategies, the determination of collinearity information may facilitate reliable orthology inferences among genes from multiple related genomes. Currently, the identification of collinear segments has mainly been based on conservation of gene order and orientation. Over the course of plant genome evolution, various evolutionary events have disrupted or distorted the order of genes along chromosomes, making it difficult to use those genes as genome-wide markers for plant genome comparisons. Using the localized LASTZ/MULTIZ analysis pipeline, we aligned 15 genomes, including cucumber and other related angiosperm plants, and identified a set of genomic segments that are short in length, stable in structure, uniform in distribution and highly conserved across all 15 plants. Compared with protein-coding genes, these conserved segments were more suitable for use as genomic markers for detecting collinear segments among distantly divergent plants. Guided by this set of identified collinear genomic segments, we inferred 94,486 orthologous protein-coding gene pairs (OPPs) between cucumber and 14 other angiosperm species, which were used as proxies for transferring functional terms to cucumber genes from the annotations of the other 14 genomes. In total, 10,885 protein-coding genes were assigned Gene Ontology (GO) terms which was nearly 1,300 more than results collected in Uniprot-proteomic database. Our results showed that annotation accuracy would been improved compared with other existing approaches. In this study, we provided an alternative resource for the functional annotation of predicted cucumber protein-coding genes, which we expect will be beneficial for the cucumber's biological study, accessible from http://cmb.bnu.edu.cn/functional_annotation. Meanwhile, using the cucumber reference genome as a case study, we presented an efficient strategy for transferring gene functional information from previously well-characterized protein-coding genes in model species to newly sequenced or "non-model" plant species.
尽管黄瓜参考基因组及其注释在几年前就已发布,但预测基因(尤其是蛋白质编码基因)的功能注释仍需进一步完善。一般来说,准确确定基因之间的直系同源关系有助于对预测基因进行更好、更可靠的功能分配。作为最可靠的策略之一,共线性信息的确定可能有助于在多个相关基因组的基因之间进行可靠的直系同源推断。目前,共线片段的鉴定主要基于基因顺序和方向的保守性。在植物基因组进化过程中,各种进化事件扰乱或扭曲了染色体上基因的顺序,使得难以将这些基因用作植物基因组比较的全基因组标记。使用本地化的LASTZ/MULTIZ分析流程,我们比对了包括黄瓜和其他相关被子植物在内的15个基因组,并鉴定出一组长度短、结构稳定、分布均匀且在所有15种植物中高度保守的基因组片段。与蛋白质编码基因相比,这些保守片段更适合用作检测远缘植物中共线片段的基因组标记。在这组鉴定出的共线基因组片段的指导下,我们推断出黄瓜与其他14种被子植物之间的94,486对直系同源蛋白质编码基因对(OPPs),这些基因对被用作从其他14个基因组的注释中向黄瓜基因转移功能术语的代理。总共有10,885个蛋白质编码基因被赋予了基因本体论(GO)术语,这比在Uniprot蛋白质组数据库中收集的结果多出近1300个。我们的结果表明,与其他现有方法相比,注释准确性得到了提高。在本研究中,我们为预测的黄瓜蛋白质编码基因的功能注释提供了一种替代资源,我们预计该资源将有利于黄瓜的生物学研究,可从http://cmb.bnu.edu.cn/functional_annotation获取。同时,以黄瓜参考基因组为例,我们提出了一种有效的策略,可将模型物种中先前已充分表征的蛋白质编码基因的基因功能信息转移到新测序的或“非模型”植物物种中。