Dong Lingli, Liu Hongfang, Zhang Juncheng, Yang Shuangjuan, Kong Guanyi, Chu Jeffrey S C, Chen Nansheng, Wang Daowen
The State Key Laboratory of Plant cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China.
Frasergen, Wuhan East Lake High-tech Zone, Wuhan, 430075, China.
BMC Genomics. 2015 Dec 9;16:1039. doi: 10.1186/s12864-015-2257-y.
The large and complex hexaploid genome has greatly hindered genomics studies of common wheat (Triticum aestivum, AABBDD). Here, we investigated transcripts in common wheat developing caryopses using the emerging single-molecule real-time (SMRT) sequencing technology PacBio RSII, and assessed the resultant data for improving common wheat genome annotation and grain transcriptome research.
We obtained 197,709 full-length non-chimeric (FLNC) reads, 74.6 % of which were estimated to carry complete open reading frame. A total of 91,881 high-quality FLNC reads were identified and mapped to 16,188 chromosomal loci, corresponding to 13,162 known genes and 3026 new genes not annotated previously. Although some FLNC reads could not be unambiguously mapped to the current draft genome sequence, many of them are likely useful for studying highly similar homoeologous or paralogous loci or for improving chromosomal contig assembly in further research. The 91,881 high-quality FLNC reads represented 22,768 unique transcripts, 9591 of which were newly discovered. We found 180 transcripts each spanning two or three previously annotated adjacent loci, suggesting that they should be merged to form correct gene models. Finally, our data facilitated the identification of 6030 genes differentially regulated during caryopsis development, and full-length transcripts for 72 transcribed gluten gene members that are important for the end-use quality control of common wheat.
Our work demonstrated the value of PacBio transcript sequencing for improving common wheat genome annotation through uncovering the loci and full-length transcripts not discovered previously. The resource obtained may aid further structural genomics and grain transcriptome studies of common wheat.
庞大而复杂的六倍体基因组极大地阻碍了普通小麦(Triticum aestivum,AABBDD)的基因组学研究。在此,我们使用新兴的单分子实时(SMRT)测序技术PacBio RSII研究了普通小麦发育中的颖果转录本,并评估了所得数据以改进普通小麦基因组注释和籽粒转录组研究。
我们获得了197,709条全长非嵌合(FLNC) reads,其中74.6%估计携带完整的开放阅读框。总共鉴定出91,881条高质量的FLNC reads,并将其定位到16,188个染色体位点,对应于13,162个已知基因和3026个先前未注释的新基因。尽管一些FLNC reads无法明确映射到当前的基因组草图序列,但它们中的许多可能有助于研究高度相似的同源或旁系同源位点,或在进一步研究中改进染色体重叠群组装。这91,881条高质量的FLNC reads代表了22,768个独特的转录本,其中9591个是新发现的。我们发现有180个转录本跨越了两个或三个先前注释的相邻位点,这表明它们应该合并以形成正确的基因模型。最后,我们的数据有助于鉴定6030个在颖果发育过程中差异调节的基因,以及72个转录的面筋基因成员的全长转录本,这些基因对面粉最终用途质量控制很重要。
我们的工作证明了PacBio转录本测序通过揭示先前未发现的位点和全长转录本,在改进普通小麦基因组注释方面的价值。所获得的资源可能有助于普通小麦进一步的结构基因组学和籽粒转录组研究。