Suppr超能文献

通过对四种社会性变形虫物种进行从头转录组组装来改进注释。

Improved annotation with de novo transcriptome assembly in four social amoeba species.

作者信息

Singh Reema, Lawal Hajara M, Schilde Christina, Glöckner Gernot, Barton Geoffrey J, Schaap Pauline, Cole Christian

机构信息

Computational Biology, School of Life Sciences, University of Dundee, Dow Street, Dundee, UK.

Cell and Development Biology, School of Life Sciences, University of Dundee, Dow Street, Dundee, UK.

出版信息

BMC Genomics. 2017 Jan 31;18(1):120. doi: 10.1186/s12864-017-3505-0.

Abstract

BACKGROUND

Annotation of gene models and transcripts is a fundamental step in genome sequencing projects. Often this is performed with automated prediction pipelines, which can miss complex and atypical genes or transcripts. RNA sequencing (RNA-seq) data can aid the annotation with empirical data. Here we present de novo transcriptome assemblies generated from RNA-seq data in four Dictyostelid species: D. discoideum, P. pallidum, D. fasciculatum and D. lacteum. The assemblies were incorporated with existing gene models to determine corrections and improvement on a whole-genome scale. This is the first time this has been performed in these eukaryotic species.

RESULTS

An initial de novo transcriptome assembly was generated by Trinity for each species and then refined with Program to Assemble Spliced Alignments (PASA). The completeness and quality were assessed with the Benchmarking Universal Single-Copy Orthologs (BUSCO) and Transrate tools at each stage of the assemblies. The final datasets of 11,315-12,849 transcripts contained 5,610-7,712 updates and corrections to >50% of existing gene models including changes to hundreds or thousands of protein products. Putative novel genes are also identified and alternative splice isoforms were observed for the first time in P. pallidum, D. lacteum and D. fasciculatum.

CONCLUSIONS

In taking a whole transcriptome approach to genome annotation with empirical data we have been able to enrich the annotations of four existing genome sequencing projects. In doing so we have identified updates to the majority of the gene annotations across all four species under study and found putative novel genes and transcripts which could be worthy for follow-up. The new transcriptome data we present here will be a valuable resource for genome curators in the Dictyostelia and we propose this effective methodology for use in other genome annotation projects.

摘要

背景

基因模型和转录本的注释是基因组测序项目的基本步骤。通常这是通过自动化预测流程来完成的,而这些流程可能会遗漏复杂和非典型的基因或转录本。RNA测序(RNA-seq)数据可以用经验数据辅助注释。在此,我们展示了从四种盘基网柄菌属物种(盘基网柄菌、苍白盘基网柄菌、束状盘基网柄菌和乳白盘基网柄菌)的RNA-seq数据生成的从头转录组组装。这些组装与现有的基因模型相结合,以在全基因组规模上确定校正和改进。这是首次在这些真核物种中进行此类操作。

结果

通过Trinity为每个物种生成初始的从头转录组组装,然后用拼接比对组装程序(PASA)进行优化。在组装的每个阶段,使用通用单拷贝直系同源基因基准(BUSCO)和Transrate工具评估完整性和质量。最终的11315 - 12849个转录本数据集包含对超过50%的现有基因模型的5610 - 7712处更新和校正,包括对数百或数千个蛋白质产物的改变。还鉴定出了推定的新基因,并且在苍白盘基网柄菌、乳白盘基网柄菌和束状盘基网柄菌中首次观察到了可变剪接异构体。

结论

通过采用全转录组方法并用经验数据进行基因组注释,我们能够丰富四个现有基因组测序项目的注释。在此过程中,我们确定了所研究的所有四个物种中大多数基因注释的更新,并发现了可能值得后续研究的推定新基因和转录本。我们在此展示的新转录组数据将成为盘基网柄菌基因组注释人员的宝贵资源,并且我们提议将这种有效方法用于其他基因组注释项目。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c72/5282741/9e046912c603/12864_2017_3505_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验