Suppr超能文献

利用全长异构体测序和短读长测序的从头组装对高度多倍体甘蔗基因组的复杂转录组进行的一项调查。

A survey of the complex transcriptome from the highly polyploid sugarcane genome using full-length isoform sequencing and de novo assembly from short read sequencing.

作者信息

Hoang Nam V, Furtado Agnelo, Mason Patrick J, Marquardt Annelie, Kasirajan Lakshmi, Thirugnanasambandam Prathima P, Botha Frederik C, Henry Robert J

机构信息

Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Room 2.245, Level 2, The John Hay Building, Queensland Biosciences Precinct [#80], 306 Carmody Road, St. Lucia, QLD, 4072, Australia.

College of Agriculture and Forestry, Hue University, Hue, Vietnam.

出版信息

BMC Genomics. 2017 May 22;18(1):395. doi: 10.1186/s12864-017-3757-8.

Abstract

BACKGROUND

Despite the economic importance of sugarcane in sugar and bioenergy production, there is not yet a reference genome available. Most of the sugarcane transcriptomic studies have been based on Saccharum officinarum gene indices (SoGI), expressed sequence tags (ESTs) and de novo assembled transcript contigs from short-reads; hence knowledge of the sugarcane transcriptome is limited in relation to transcript length and number of transcript isoforms.

RESULTS

The sugarcane transcriptome was sequenced using PacBio isoform sequencing (Iso-Seq) of a pooled RNA sample derived from leaf, internode and root tissues, of different developmental stages, from 22 varieties, to explore the potential for capturing full-length transcript isoforms. A total of 107,598 unique transcript isoforms were obtained, representing about 71% of the total number of predicted sugarcane genes. The majority of this dataset (92%) matched the plant protein database, while just over 2% was novel transcripts, and over 2% was putative long non-coding RNAs. About 56% and 23% of total sequences were annotated against the gene ontology and KEGG pathway databases, respectively. Comparison with de novo contigs from Illumina RNA-Sequencing (RNA-Seq) of the internode samples from the same experiment and public databases showed that the Iso-Seq method recovered more full-length transcript isoforms, had a higher N50 and average length of largest 1,000 proteins; whereas a greater representation of the gene content and RNA diversity was captured in RNA-Seq. Only 62% of PacBio transcript isoforms matched 67% of de novo contigs, while the non-matched proportions were attributed to the inclusion of leaf/root tissues and the normalization in PacBio, and the representation of more gene content and RNA classes in the de novo assembly, respectively. About 69% of PacBio transcript isoforms and 41% of de novo contigs aligned with the sorghum genome, indicating the high conservation of orthologs in the genic regions of the two genomes.

CONCLUSIONS

The transcriptome dataset should contribute to improved sugarcane gene models and sugarcane protein predictions; and will serve as a reference database for analysis of transcript expression in sugarcane.

摘要

背景

尽管甘蔗在糖和生物能源生产中具有重要的经济价值,但目前尚无参考基因组。大多数甘蔗转录组学研究基于甘蔗基因索引(SoGI)、表达序列标签(EST)以及来自短读长的从头组装转录本重叠群;因此,关于甘蔗转录组的知识在转录本长度和转录本异构体数量方面有限。

结果

使用PacBio异构体测序(Iso-Seq)对来自22个品种不同发育阶段的叶、节间和根组织的混合RNA样本进行测序,以探索捕获全长转录本异构体的潜力。共获得107,598个独特的转录本异构体,约占预测甘蔗基因总数的71%。该数据集的大部分(92%)与植物蛋白质数据库匹配,而略多于2%是新转录本,超过2%是假定的长链非编码RNA。分别约56%和23%的总序列在基因本体和KEGG通路数据库中得到注释。与同一实验节间样本的Illumina RNA测序(RNA-Seq)和公共数据库中的从头组装重叠群进行比较表明,Iso-Seq方法回收了更多的全长转录本异构体,具有更高的N50和最大的1000个蛋白质的平均长度;而RNA-Seq中捕获了更多的基因内容和RNA多样性。只有62%的PacBio转录本异构体与67%的从头组装重叠群匹配,而不匹配的比例分别归因于PacBio中包含叶/根组织和标准化,以及从头组装中更多的基因内容和RNA类别。约69%的PacBio转录本异构体和41%的从头组装重叠群与高粱基因组比对,表明两个基因组基因区域直系同源物的高度保守性。

结论

转录组数据集应有助于改进甘蔗基因模型和甘蔗蛋白质预测;并将作为分析甘蔗转录本表达的参考数据库。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3fd/5440902/115bfcdb68bb/12864_2017_3757_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验