Nishiyama Milton Yutaka, Ferreira Savio Siqueira, Tang Pei-Zhong, Becker Scott, Pörtner-Taliana Antje, Souza Glaucia Mendes
Departamento de Bioquímica, Universidade de São Paulo, São Paulo, SP, Brazil.
ThermoFisher Scientific, Carlsbad, California, United States of America.
PLoS One. 2014 Sep 15;9(9):e107351. doi: 10.1371/journal.pone.0107351. eCollection 2014.
Sugarcane is a major crop used for food and bioenergy production. Modern cultivars are hybrids derived from crosses between Saccharum officinarum and Saccharum spontaneum. Hybrid cultivars combine favorable characteristics from ancestral species and contain a genome that is highly polyploid and aneuploid, containing 100-130 chromosomes. These complex genomes represent a huge challenge for molecular studies and for the development of biotechnological tools that can facilitate sugarcane improvement. Here, we describe full-length enriched cDNA libraries for Saccharum officinarum, Saccharum spontaneum, and one hybrid genotype (SP803280) and analyze the set of open reading frames (ORFs) in their genomes (i.e., their ORFeomes). We found 38,195 (19%) sugarcane-specific transcripts that did not match transcripts from other databases. Less than 1.6% of all transcripts were ancestor-specific (i.e., not expressed in SP803280). We also found 78,008 putative new sugarcane transcripts that were absent in the largest sugarcane expressed sequence tag database (SUCEST). Functional annotation showed a high frequency of protein kinases and stress-related proteins. We also detected natural antisense transcript expression, which mapped to 94% of all plant KEGG pathways; however, each genotype showed different pathways enriched in antisense transcripts. Our data appeared to cover 53.2% (17,563 genes) and 46.8% (937 transcription factors) of all sugarcane full-length genes and transcription factors, respectively. This work represents a significant advancement in defining the sugarcane ORFeome and will be useful for protein characterization, single nucleotide polymorphism and splicing variant identification, evolutionary and comparative studies, and sugarcane genome assembly and annotation.
甘蔗是一种用于食品和生物能源生产的主要作物。现代栽培品种是甘蔗属热带种和甘蔗属野生种杂交产生的杂种。杂交品种结合了祖先物种的优良特性,其基因组高度多倍体且非整倍体,含有100 - 130条染色体。这些复杂的基因组对分子研究以及开发有助于甘蔗改良的生物技术工具构成了巨大挑战。在此,我们描述了甘蔗属热带种、甘蔗属野生种以及一个杂交基因型(SP803280)的全长富集cDNA文库,并分析了它们基因组中的开放阅读框(ORF)集(即它们的ORFeome)。我们发现了38195个(19%)甘蔗特异性转录本,这些转录本与其他数据库中的转录本不匹配。所有转录本中不到1.6%是祖先特异性的(即不在SP803280中表达)。我们还发现了78008个推定的甘蔗新转录本,这些转录本在最大的甘蔗表达序列标签数据库(SUCEST)中不存在。功能注释显示蛋白激酶和胁迫相关蛋白的频率很高。我们还检测到了天然反义转录本表达,其映射到所有植物KEGG途径的94%;然而,每种基因型在反义转录本中富集的途径不同。我们的数据似乎分别覆盖了所有甘蔗全长基因和转录因子的53.2%(17563个基因)和46.8%(937个转录因子)。这项工作在定义甘蔗ORFeome方面取得了重大进展,将有助于蛋白质表征、单核苷酸多态性和剪接变体鉴定、进化和比较研究以及甘蔗基因组组装和注释。