Suppr超能文献

转录组组装和差异基因表达定量中的挑战与策略。RNA-seq 实验的综合计算机评估。

Challenges and strategies in transcriptome assembly and differential gene expression quantification. A comprehensive in silico assessment of RNA-seq experiments.

机构信息

Department of Evolutionary Biology and Science for Life Laboratory, Uppsala University, Uppsala, Sweden.

出版信息

Mol Ecol. 2013 Feb;22(3):620-34. doi: 10.1111/mec.12014. Epub 2012 Sep 24.

Abstract

Transcriptome Shotgun Sequencing (RNA-seq) has been readily embraced by geneticists and molecular ecologists alike. As with all high-throughput technologies, it is critical to understand which analytic strategies are best suited and which parameters may bias the interpretation of the data. Here we use a comprehensive simulation approach to explore how various features of the transcriptome (complexity, degree of polymorphism π, alternative splicing), technological processing (sequencing error ε, library normalization) and bioinformatic workflow (de novo vs. mapping assembly, reference genome quality) impact transcriptome quality and inference of differential gene expression (DE). We find that transcriptome assembly and gene expression profiling (EdgeR vs. BaySeq software) works well even in the absence of a reference genome and is robust across a broad range of parameters. We advise against library normalization and in most situations advocate mapping assemblies to an annotated genome of a divergent sister clade, which generally outperformed de novo assembly (Trans-Abyss, Trinity, Soapdenovo-Trans). Transcriptome complexity (size, paralogs, alternative splicing isoforms) negatively affected the assembly and DE profiling, whereas the effects of sequencing error and polymorphism were almost negligible. Finally, we highlight the challenge of gene name assignment for de novo assemblies, the importance of mapping strategies and raise awareness of challenges associated with the quality of reference genomes. Overall, our results have significant practical and methodological implications and can provide guidance in the design and analysis of RNA-seq experiments, particularly for organisms where genomic background information is lacking.

摘要

转录组鸟枪法测序 (RNA-seq) 已经得到遗传学家和分子生态学家的广泛认可。与所有高通量技术一样,理解哪些分析策略最适合以及哪些参数可能会影响数据的解释至关重要。在这里,我们使用全面的模拟方法来探讨转录组的各种特征(复杂性、多态性 π 程度、可变剪接)、技术处理(测序错误 ε、文库归一化)和生物信息学工作流程(从头组装与映射组装、参考基因组质量)如何影响转录组质量和差异基因表达 (DE) 的推断。我们发现,即使没有参考基因组,转录组组装和基因表达分析(EdgeR 与 BaySeq 软件)也能很好地工作,并且在广泛的参数范围内具有很强的鲁棒性。我们不建议进行文库归一化,并且在大多数情况下主张将映射组装到一个分歧的姐妹进化枝的注释基因组上,这通常优于从头组装 (Trans-Abyss、Trinity、Soapdenovo-Trans)。转录组复杂性(大小、旁系同源物、可变剪接异构体)对组装和 DE 分析有负面影响,而测序错误和多态性的影响几乎可以忽略不计。最后,我们强调了从头组装基因命名的挑战、映射策略的重要性,并提高了对参考基因组质量相关挑战的认识。总的来说,我们的结果具有重要的实际和方法学意义,可以为 RNA-seq 实验的设计和分析提供指导,特别是对于缺乏基因组背景信息的生物体。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验