Epigenetics Institute and Department of Cell and Developmental Biology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA.
Department of Urology and Institute of Neuropathology, Medical Center-University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany.
BMC Biol. 2021 Nov 27;19(1):254. doi: 10.1186/s12915-021-01188-w.
Functional genomic analyses rely on high-quality genome assemblies and annotations. Highly contiguous genome assemblies have become available for a variety of species, but accurate and complete annotation of gene models, inclusive of alternative splice isoforms and transcription start and termination sites, remains difficult with traditional approaches.
Here, we utilized full-length isoform sequencing (Iso-Seq), a long-read RNA sequencing technology, to obtain a comprehensive annotation of the transcriptome of the ant Harpegnathos saltator. The improved genome annotations include additional splice isoforms and extended 3' untranslated regions for more than 4000 genes. Reanalysis of RNA-seq experiments using these annotations revealed several genes with caste-specific differential expression and tissue- or caste-specific splicing patterns that were missed in previous analyses. The extended 3' untranslated regions afforded great improvements in the analysis of existing single-cell RNA-seq data, resulting in the recovery of the transcriptomes of 18% more cells. The deeper single-cell transcriptomes obtained with these new annotations allowed us to identify additional markers for several cell types in the ant brain, as well as genes differentially expressed across castes in specific cell types.
Our results demonstrate that Iso-Seq is an efficient and effective approach to improve genome annotations and maximize the amount of information that can be obtained from existing and future genomic datasets in Harpegnathos and other organisms.
功能基因组分析依赖于高质量的基因组组装和注释。各种物种的高度连续基因组组装已经可用,但传统方法仍然难以准确和完整地注释基因模型,包括替代剪接异构体以及转录起始和终止位点。
在这里,我们利用全长异构体测序(Iso-Seq),一种长读长 RNA 测序技术,对 Harpegnathos saltator 蚂蚁的转录组进行了全面注释。改进的基因组注释包括 4000 多个基因的额外剪接异构体和扩展的 3'非翻译区。使用这些注释重新分析 RNA-seq 实验,揭示了几个具有特定等级差异表达和组织或等级特异性剪接模式的基因,这些基因在以前的分析中被忽略了。扩展的 3'非翻译区极大地改进了现有单细胞 RNA-seq 数据的分析,使得更多的细胞转录组得以恢复。利用这些新注释获得的更深层次的单细胞转录组,使我们能够鉴定出蚂蚁大脑中几种细胞类型的额外标记物,以及在特定细胞类型中不同等级表达的基因。
我们的结果表明,Iso-Seq 是一种有效提高基因组注释并最大限度地从 Harpegnathos 和其他生物体中现有和未来基因组数据集中获取信息的有效方法。