Department of Bacterial Infections, Research Institute for Microbial Diseases, Osaka University, Osaka, Japan.
Department of Infection Metagenomics, Research Institute for Microbial Diseases, Osaka University, Osaka, Japan.
Methods Mol Biol. 2022;2477:79-89. doi: 10.1007/978-1-0716-2257-5_6.
Computational approaches are the main approaches used in genome annotation. However, accuracy is low. Untranslated regions are not identified, complex isoforms are not predicted correctly and discovery rate of noncoding RNA is low. RNA-seq has revolutionized transcriptome reconstruction over the last decade. However, fragmentation included in cDNA sequencing leads to information loss, requiring transcripts to be assembled and reconstructed, thus affecting the accuracy of reconstructed transcriptome. Recently, long-read sequencing has been introduced with technologies such as Oxford Nanopore sequencing. cDNA is sequenced directly without fragmentation producing long reads that don't need to be assembled keeping the transcript structure intact and increasing the accuracy of transcriptome reconstruction.Here we present a protocol and a pipeline to reconstruct the transcriptome of compact genomes including yeasts. It involves generating full-length cDNA and using Oxford Nanopore ligation-based sequencing kit to sequence multiple samples in the same run. The pipeline (1) strands the generated long reads, (2) corrects the reads by mapping them to the reference genome, (3) identifies transcripts including 5'UTR and 3'UTR, (4) profiles the isoforms, filtering out artifacts resulting from low accuracy in sequencing, and (5) improves accuracy of provided annotations. Using long reads improves the accuracy of transcriptome reconstruction and helps in discovering a significant number of novel RNAs.
计算方法是基因组注释中主要使用的方法。然而,其准确性较低。非翻译区未被识别,复杂的异构体不能正确预测,非编码 RNA 的发现率也较低。在过去的十年中,RNA-seq 彻底改变了转录组重构。然而,cDNA 测序中包含的片段化导致了信息丢失,需要对转录本进行组装和重构,从而影响重构转录组的准确性。最近,随着 Oxford Nanopore 测序等技术的引入,出现了长读测序。cDNA 直接进行测序,不会产生需要组装的长读,保持转录本结构完整,提高转录组重构的准确性。在这里,我们提出了一个方案和一个用于重构包括酵母在内的紧凑型基因组转录组的流水线。它包括生成全长 cDNA,并使用 Oxford Nanopore 连接测序试剂盒在同一运行中对多个样本进行测序。该流水线(1)对生成的长读进行链化,(2)通过将它们映射到参考基因组来纠正读,(3)识别包括 5'UTR 和 3'UTR 的转录本,(4)分析异构体,过滤掉由于测序准确性低而产生的伪影,(5)提高提供注释的准确性。使用长读提高了转录组重构的准确性,并有助于发现大量新的 RNA。