Department of Genetics, University of Georgia, Athens, Georgia 30602, USA.
Department of Biology, University of Texas Arlington, Arlington, Texas 76019, USA.
Genome Res. 2021 Aug;31(8):1486-1497. doi: 10.1101/gr.274282.120. Epub 2021 Jun 15.
Alternate isoforms are important contributors to phenotypic diversity across eukaryotes. Although short-read RNA-sequencing has increased our understanding of isoform diversity, it is challenging to accurately detect full-length transcripts, preventing the identification of many alternate isoforms. Long-read sequencing technologies have made it possible to sequence full-length alternative transcripts, accurately characterizing alternative splicing events, alternate transcription start and end sites, and differences in UTR regions. Here, we use Pacific Biosciences (PacBio) long-read RNA-sequencing (Iso-Seq) to examine the transcriptomes of five organs in threespine stickleback fish (), a widely used genetic model species. The threespine stickleback fish has a refined genome assembly in which gene annotations are based on short-read RNA sequencing and predictions from coding sequence of other species. This suggests some of the existing annotations may be inaccurate or alternative transcripts may not be fully characterized. Using Iso-Seq we detected thousands of novel isoforms, indicating many isoforms are absent in the current Ensembl gene annotations. In addition, we refined many of the existing annotations within the genome. We noted many improperly positioned transcription start sites that were refined with long-read sequencing. The Iso-Seq-predicted transcription start sites were more accurate and verified through ATAC-seq. We also detected many alternative splicing events between sexes and across organs. We found a substantial number of genes in both somatic and gonadal samples that had sex-specific isoforms. Our study highlights the power of long-read sequencing to study the complexity of transcriptomes, greatly improving genomic resources for the threespine stickleback fish.
可变剪接异构体是真核生物表型多样性的重要贡献者。尽管短读长 RNA 测序提高了我们对异构体多样性的认识,但准确检测全长转录本具有挑战性,这阻碍了许多可变剪接异构体的鉴定。长读长测序技术使我们能够对全长的可变转录本进行测序,从而准确地描述可变剪接事件、可变转录起始和终止位点以及 UTR 区域的差异。在这里,我们使用 Pacific Biosciences (PacBio) 长读长 RNA 测序 (Iso-Seq) 来研究三种棘鱼()五个器官的转录组,三种棘鱼是一种广泛使用的遗传模式物种。三种棘鱼具有精细的基因组组装,其基因注释基于短读长 RNA 测序和来自其他物种编码序列的预测。这表明现有的一些注释可能不准确或可变转录本可能没有得到充分描述。使用 Iso-Seq,我们检测到了数千种新的异构体,这表明当前 Ensembl 基因注释中存在许多缺失的异构体。此外,我们还对基因组中的许多现有注释进行了细化。我们注意到许多转录起始位点的位置不正确,这些位置在长读测序中得到了细化。Iso-Seq 预测的转录起始位点更准确,并通过 ATAC-seq 得到验证。我们还在性别和器官之间检测到了许多可变剪接事件。我们在体腔和性腺样本中发现了大量具有性别特异性异构体的基因。我们的研究强调了长读测序在研究转录组复杂性方面的强大功能,极大地改善了三种棘鱼的基因组资源。