非模式物种转录组组装面临的挑战与进展

Challenges and advances for transcriptome assembly in non-model species.

作者信息

Ungaro Arnaud, Pech Nicolas, Martin Jean-François, McCairns R J Scott, Mévy Jean-Philippe, Chappaz Rémi, Gilles André

机构信息

UMR 7263, Équipe Évolution Génome Environnement, Aix Marseille Université, CNRS, IRD, IMBE, Marseille, France.

UMR Centre de Biologie pour la Gestion des Populations, Montpellier SupAgro, Montferrier-sur-Lez, France.

出版信息

PLoS One. 2017 Sep 20;12(9):e0185020. doi: 10.1371/journal.pone.0185020. eCollection 2017.

DOI:10.1371/journal.pone.0185020

PMID:28931057

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5607178/

Abstract

Analyses of high-throughput transcriptome sequences of non-model organisms are based on two main approaches: de novo assembly and genome-guided assembly using mapping to assign reads prior to assembly. Given the limits of mapping reads to a reference when it is highly divergent, as is frequently the case for non-model species, we evaluate whether using blastn would outperform mapping methods for read assignment in such situations (>15% divergence). We demonstrate its high performance by using simulated reads of lengths corresponding to those generated by the most common sequencing platforms, and over a realistic range of genetic divergence (0% to 30% divergence). Here we focus on gene identification and not on resolving the whole set of transcripts (i.e. the complete transcriptome). For simulated datasets, the transcriptome-guided assembly based on blastn recovers 94.8% of genes irrespective of read length at 0% divergence; however, assignment rate of reads is negatively correlated with both increasing divergence level and reducing read lengths. Nevertheless, we still observe 92.6% of recovered genes at 30% divergence irrespective of read length. This analysis also produces a categorization of genes relative to their assignment, and suggests guidelines for data processing prior to analyses of comparative transcriptomics and gene expression to minimize potential inferential bias associated with incorrect transcript assignment. We also compare the performances of de novo assembly alone vs in combination with a transcriptome-guided assembly based on blastn both via simulation and empirically, using data from a cyprinid fish species and from an oak species. For any simulated scenario, the transcriptome-guided assembly using blastn outperforms the de novo approach alone, including when the divergence level is beyond the reach of traditional mapping methods. Combining de novo assembly and a related reference transcriptome for read assignment also addresses the bias/error in contigs caused by the dependence on a related reference alone. Empirical data corroborate these findings when assembling transcriptomes from the two non-model organisms: Parachondrostoma toxostoma (fish) and Quercus pubescens (plant). For the fish species, out of the 31,944 genes known from D. rerio, the guided and de novo assemblies recover respectively 20,605 and 20,032 genes but the performance of the guided assembly approach is much higher for both the contiguity and completeness metrics. For the oak, out of the 29,971 genes known from Vitis vinifera, the transcriptome-guided and de novo assemblies display similar performance, but the new guided approach detects 16,326 genes where the de novo assembly only detects 9,385 genes.

摘要

对非模式生物的高通量转录组序列进行分析主要基于两种方法

从头组装和在组装前通过映射来分配 reads 的基因组引导组装。鉴于在参考序列高度分化时（非模式物种常常如此）将 reads 映射到参考序列存在局限性，我们评估在这种情况（分化率>15%）下使用 blastn 在 reads 分配方面是否优于映射方法。我们通过使用与最常见测序平台生成的长度对应的模拟 reads，并在实际的遗传分化范围内（0%至 30%分化率）来证明其高性能。这里我们关注的是基因识别，而非解析整个转录本集合（即完整的转录组）。对于模拟数据集，基于 blastn 的转录组引导组装在 0%分化率时，无论 reads 长度如何，都能找回 94.8%的基因；然而，reads 的分配率与分化水平的增加和 reads 长度的减少均呈负相关。尽管如此，在 30%分化率时，无论 reads 长度如何，我们仍能观察到 92.6%的找回基因。该分析还根据基因的分配情况进行了分类，并为比较转录组学和基因表达分析之前的数据处理提供了指导方针，以尽量减少与错误转录本分配相关的潜在推断偏差。我们还通过模拟以及实证方式，使用鲤科鱼类和橡树物种的数据，比较了单独的从头组装与结合基于 blastn 的转录组引导组装的性能。对于任何模拟场景，使用 blastn 的转录组引导组装都优于单独的从头组装方法，包括当分化水平超出传统映射方法的范围时。将从头组装和相关的参考转录组用于 reads 分配，也解决了仅依赖相关参考所导致的重叠群中的偏差/错误。当从两种非模式生物：托氏副软骨鱼（鱼类）和柔毛栎（植物）组装转录组时，实证数据证实了这些发现。对于鱼类物种，从斑马鱼已知的 31944 个基因中，引导组装和从头组装分别找回了 20605 个和 20032 个基因，但引导组装方法在连续性和完整性指标方面的性能要高得多。对于橡树，从葡萄已知的 29971 个基因中，转录组引导组装和从头组装表现出相似的性能，但新的引导方法检测到 16326 个基因，而从头组装仅检测到 9385 个基因。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6d4e/5607178/302040d80529/pone.0185020.g001.jpg

相似文献

Challenges and advances for transcriptome assembly in non-model species.

PLoS One. 2017 Sep 20;12(9):e0185020. doi: 10.1371/journal.pone.0185020. eCollection 2017.

Comparative performance of transcriptome assembly methods for non-model organisms.

BMC Genomics. 2016 Jul 27;17:523. doi: 10.1186/s12864-016-2923-8.

RNA-seq analysis of Quercus pubescens Leaves: de novo transcriptome assembly, annotation and functional markers development.

PLoS One. 2014 Nov 13;9(11):e112487. doi: 10.1371/journal.pone.0112487. eCollection 2014.

PARRoT- a homology-based strategy to quantify and compare RNA-sequencing from non-model organisms.

BMC Bioinformatics. 2016 Dec 22;17(Suppl 19):513. doi: 10.1186/s12859-016-1366-1.

A survey of the complex transcriptome from the highly polyploid sugarcane genome using full-length isoform sequencing and de novo assembly from short read sequencing.

BMC Genomics. 2017 May 22;18(1):395. doi: 10.1186/s12864-017-3757-8.

The central nervous system transcriptome of the weakly electric brown ghost knifefish (Apteronotus leptorhynchus): de novo assembly, annotation, and proteomics validation.

BMC Genomics. 2015 Mar 11;16(1):166. doi: 10.1186/s12864-015-1354-2.

Reference-guided de novo assembly approach improves genome reconstruction for related species.

BMC Bioinformatics. 2017 Nov 10;18(1):474. doi: 10.1186/s12859-017-1911-6.

454 pyrosequencing-based analysis of gene expression profiles in the amphipod Melita plumulosa: transcriptome assembly and toxicant induced changes.

Aquat Toxicol. 2014 Aug;153:73-88. doi: 10.1016/j.aquatox.2013.11.022. Epub 2013 Dec 12.

Assembly and annotation of a non-model gastropod (Nerita melanotragus) transcriptome: a comparison of de novo assemblers.

BMC Res Notes. 2014 Aug 1;7:488. doi: 10.1186/1756-0500-7-488.

Optimization of de novo transcriptome assembly from next-generation sequencing data.

Genome Res. 2010 Oct;20(10):1432-40. doi: 10.1101/gr.103846.109. Epub 2010 Aug 6.

引用本文的文献

De novo transcriptome assembly and analysis during agarwood induction in Gyrinops versteegii Gilg. seedling.

Sci Rep. 2025 Jan 23;15(1):2977. doi: 10.1038/s41598-025-87486-7.

sc-SPLASH provides ultra-efficient reference-free discovery in barcoded single-cell sequencing.

bioRxiv. 2024 Dec 24:2024.12.24.630263. doi: 10.1101/2024.12.24.630263.

Optimizing an efficient ensemble approach for high-quality de novo transcriptome assembly of Thymus daenensis.

Sci Rep. 2023 Jul 31;13(1):12415. doi: 10.1038/s41598-023-39620-6.

Breaking the reproductive barrier of divergent species to explore the genomic landscape.

Front Genet. 2022 Sep 23;13:963341. doi: 10.3389/fgene.2022.963341. eCollection 2022.

Comparative Transcriptome Analyses of Different Tissues Reveal Differentially Expressed Genes Associated with Anthraquinone, Catechin, and Gallic Acid Biosynthesis.

Genes (Basel). 2022 Sep 5;13(9):1592. doi: 10.3390/genes13091592.

Proteotranscriptomics - A facilitator in omics research.

Comput Struct Biotechnol J. 2022 Jul 9;20:3667-3675. doi: 10.1016/j.csbj.2022.07.007. eCollection 2022.

Insights into the species evolution of copepods in the northern seas revealed by transcriptome sequencing.

Ecol Evol. 2022 Feb 22;12(2):e8606. doi: 10.1002/ece3.8606. eCollection 2022 Feb.

Modern Approaches for Transcriptome Analyses in Plants.

Adv Exp Med Biol. 2021;1346:11-50. doi: 10.1007/978-3-030-80352-0_2.

Large-Scale Multiplexing Permits Full-Length Transcriptome Annotation of 32 Bovine Tissues From a Single Nanopore Flow Cell.

Front Genet. 2021 May 20;12:664260. doi: 10.3389/fgene.2021.664260. eCollection 2021.

Best practices on the differential expression analysis of multi-species RNA-seq.

Genome Biol. 2021 Apr 29;22(1):121. doi: 10.1186/s13059-021-02337-8.

本文引用的文献

Comprehensive evaluation of de novo transcriptome assembly programs and their effects on differential gene expression analysis.

Bioinformatics. 2017 Feb 1;33(3):327-333. doi: 10.1093/bioinformatics/btw625.

The genetic map of goldfish (Carassius auratus) provided insights to the divergent genome evolutions in the Cyprinidae family.

Sci Rep. 2016 Oct 6;6:34849. doi: 10.1038/srep34849.

Comparative performance of transcriptome assembly methods for non-model organisms.

BMC Genomics. 2016 Jul 27;17:523. doi: 10.1186/s12864-016-2923-8.

rnaQUAST: a quality assessment tool for de novo transcriptome assemblies.

Bioinformatics. 2016 Jul 15;32(14):2210-2. doi: 10.1093/bioinformatics/btw218. Epub 2016 Apr 23.

Comparison of De Novo Transcriptome Assemblers and k-mer Strategies Using the Killifish, Fundulus heteroclitus.

PLoS One. 2016 Apr 7;11(4):e0153104. doi: 10.1371/journal.pone.0153104. eCollection 2016.

De novo assembly and characterization of leaf and floral transcriptomes of the hybridizing bromeliad species (Pitcairnia spp.) adapted to Neotropical Inselbergs.

Mol Ecol Resour. 2016 Jul;16(4):1012-22. doi: 10.1111/1755-0998.12504. Epub 2016 Mar 2.

The power and promise of RNA-seq in ecology and evolution.

Mol Ecol. 2016 Mar;25(6):1224-41. doi: 10.1111/mec.13526. Epub 2016 Mar 1.

PANTHER version 10: expanded protein families and functions, and analysis tools.

Nucleic Acids Res. 2016 Jan 4;44(D1):D336-42. doi: 10.1093/nar/gkv1194. Epub 2015 Nov 17.

Optimization of next-generation sequencing transcriptome annotation for species lacking sequenced genomes.

Mol Ecol Resour. 2016 Mar;16(2):446-58. doi: 10.1111/1755-0998.12465. Epub 2015 Oct 14.

Transcriptomics of colour patterning and coloration shifts in crows.

Mol Ecol. 2015 Sep;24(18):4617-28. doi: 10.1111/mec.13353. Epub 2015 Sep 10.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

非模式物种转录组组装面临的挑战与进展

Challenges and advances for transcriptome assembly in non-model species.

作者信息

机构信息

出版信息

对非模式生物的高通量转录组序列进行分析主要基于两种方法

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献