Bioinformatics Core Facility, Ben-Gurion University of the Negev, Beer-Sheva, Israel.
Methods Mol Biol. 2021;2243:143-167. doi: 10.1007/978-1-0716-1103-6_8.
RNA-Seq is nowadays an indispensable approach for comparative transcriptome profiling in model and nonmodel organisms. Analyzing RNA-Seq data from nonmodel organisms poses unique challenges, due to unavailability of a high-quality genome reference and to relative sparsity of tools for downstream functional analyses. In this chapter, we provide an overview of the analysis steps in RNA-Seq projects of nonmodel organisms, while elaborating on aspects that are unique to this analysis. These will include (1) strategic decisions that have to be made in advance, regarding sequencing technology and reference to use; (2) how to search for available draft genomes, and, if necessary, how to improve their gene prediction and annotation; (3) how to clean raw reads before de novo assembly; (4) how to separate the reads in RNA-Seq projects of symbiont organisms; (5) how to design and carry out a de novo transcriptome assembly that will be comprehensive and reliable; (6) how to assess transcriptome quality; (7) when and how to reduce redundancy in the transcriptome; (8) techniques and considerations in transcriptome functional annotation; (9) quantitating transcript abundance in the face of high transcriptome redundancy; and, most importantly, (10) how to achieve functional enrichment testing using available tools which either support a large range of species or enable a universal, non-species-specific analysis.Throughout the chapter, we will refer to a variety of useful software tools. For the initial analysis steps involving high-volume data, these will include Linux-based programs. For the later steps, we will describe both Linux and R packages for advanced users, as well as many user-friendly tools for nonprogrammers. Finally, we will present a full workflow for RNA-Seq analysis of nonmodel organisms using the NeatSeq-Flow platform, which can be used locally through a user-friendly interface.
RNA-Seq 是当今模式和非模式生物比较转录组分析不可或缺的方法。由于缺乏高质量的基因组参考和相对较少的下游功能分析工具,因此分析非模式生物的 RNA-Seq 数据具有独特的挑战。在本章中,我们将概述非模式生物 RNA-Seq 项目的分析步骤,同时详细阐述了该分析的独特方面。这些方面将包括:(1) 在测序技术和参考使用方面预先做出的战略决策;(2) 如何搜索可用的草图基因组,如果需要,如何改进它们的基因预测和注释;(3) 在从头组装之前如何清理原始读取;(4) 如何在共生生物的 RNA-Seq 项目中分离读取;(5) 如何设计和进行全面可靠的从头转录组组装;(6) 如何评估转录组质量;(7) 何时以及如何减少转录组中的冗余;(8) 转录组功能注释的技术和注意事项;(9) 在面对高转录组冗余时定量转录本丰度;以及最重要的是 (10) 如何使用支持多种物种或实现通用、非特定于物种的分析的可用工具进行功能富集测试。在整个章节中,我们将参考各种有用的软件工具。对于涉及大量数据的初始分析步骤,这些工具将包括基于 Linux 的程序。对于后续步骤,我们将描述适用于高级用户的 Linux 和 R 包,以及许多适用于非程序员的用户友好型工具。最后,我们将使用 NeatSeq-Flow 平台展示非模式生物 RNA-Seq 分析的完整工作流程,该平台可以通过用户友好的界面在本地使用。