基于 454 转录组数据评估从头组装软件的特性：一种模拟方法。

Evaluating characteristics of de novo assembly software on 454 transcriptome data: a simulation approach.

机构信息

Evolutionary Bioinformatics, Institute for Evolution and Biodiversity, Westfaelische-Wilhelms-University, Muenster, Germany.

出版信息

PLoS One. 2012;7(2):e31410. doi: 10.1371/journal.pone.0031410. Epub 2012 Feb 27.

DOI:10.1371/journal.pone.0031410

PMID:22384018

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3288049/

Abstract

BACKGROUND

The quantity of transcriptome data is rapidly increasing for non-model organisms. As sequencing technology advances, focus shifts towards solving bioinformatic challenges, of which sequence read assembly is the first task. Recent studies have compared the performance of different software to establish a best practice for transcriptome assembly. Here, we adapted a simulation approach to evaluate specific features of assembly programs on 454 data. The novelty of our study is that the simulation allows us to calculate a model assembly as reference point for comparison.

FINDINGS

The simulation approach allows us to compare basic metrics of assemblies computed by different software applications (CAP3, MIRA, Newbler, and Oases) to a known optimal solution. We found MIRA and CAP3 are conservative in merging reads. This resulted in comparably high number of short contigs. In contrast, Newbler more readily merged reads into longer contigs, while Oases produced the overall shortest assembly. Due to the simulation approach, reads could be traced back to their correct placement within the transcriptome. Together with mapping reads onto the assembled contigs, we were able to evaluate ambiguity in the assemblies. This analysis further supported the conservative nature of MIRA and CAP3, which resulted in low proportions of chimeric contigs, but high redundancy. Newbler produced less redundancy, but the proportion of chimeric contigs was higher.

CONCLUSION

Our evaluation of four assemblers suggested that MIRA and Newbler slightly outperformed the other programs, while showing contrasting characteristics. Oases did not perform very well on the 454 reads. Our evaluation indicated that the software was either conservative (MIRA) or liberal (Newbler) about merging reads into contigs. This suggested that in choosing an assembly program researchers should carefully consider their follow up analysis and consequences of the chosen approach to gain an assembly.

摘要

背景

非模式生物的转录组数据量正在迅速增加。随着测序技术的进步，研究重点转向解决生物信息学挑战，其中序列读取组装是首要任务。最近的研究比较了不同软件的性能，以建立转录组组装的最佳实践。在这里，我们采用模拟方法来评估 454 数据上组装程序的特定特征。本研究的新颖之处在于，模拟允许我们计算模型组装作为比较的参考点。

结果

模拟方法允许我们将不同软件应用程序（CAP3、MIRA、Newbler 和 Oases）计算的基本指标与已知的最佳解决方案进行比较。我们发现 MIRA 和 CAP3 在合并读取时较为保守，这导致了相当多的短序列。相比之下，Newbler 更容易将读取合并成长序列，而 Oases 则产生了总体最短的组装。由于模拟方法，读取可以追溯到它们在转录本中的正确位置。结合将读取映射到组装的 contigs 上，我们能够评估组装中的歧义。这种分析进一步支持了 MIRA 和 CAP3 的保守性质，它们导致了低比例的嵌合 contigs，但冗余度高。Newbler 产生的冗余度较低，但嵌合 contigs 的比例较高。

结论

我们对四个组装程序的评估表明，MIRA 和 Newbler 略微优于其他程序，同时表现出不同的特点。Oases 在 454 读取上的性能不是很好。我们的评估表明，软件在将读取合并到 contigs 时要么保守（MIRA），要么自由（Newbler）。这表明在选择组装程序时，研究人员应仔细考虑其后续分析以及所选方法的后果，以获得组装结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37d6/3288049/318292f0e741/pone.0031410.g001.jpg

相似文献

Evaluating characteristics of de novo assembly software on 454 transcriptome data: a simulation approach.

PLoS One. 2012;7(2):e31410. doi: 10.1371/journal.pone.0031410. Epub 2012 Feb 27.

Comparing de novo assemblers for 454 transcriptome data.

BMC Genomics. 2010 Oct 16;11:571. doi: 10.1186/1471-2164-11-571.

De novo transcriptome assembly for a non-model species, the blood-sucking bug Triatoma brasiliensis, a vector of Chagas disease.

Genetica. 2015 Apr;143(2):225-39. doi: 10.1007/s10709-014-9790-5. Epub 2014 Sep 19.

Short read Illumina data for the de novo assembly of a non-model snail species transcriptome (Radix balthica, Basommatophora, Pulmonata), and a comparison of assembler performance.

BMC Genomics. 2011 Jun 16;12:317. doi: 10.1186/1471-2164-12-317.

Challenges and advances for transcriptome assembly in non-model species.

PLoS One. 2017 Sep 20;12(9):e0185020. doi: 10.1371/journal.pone.0185020. eCollection 2017.

Comparison of De Novo Transcriptome Assemblers and k-mer Strategies Using the Killifish, Fundulus heteroclitus.

PLoS One. 2016 Apr 7;11(4):e0153104. doi: 10.1371/journal.pone.0153104. eCollection 2016.

Comparative performance of transcriptome assembly methods for non-model organisms.

BMC Genomics. 2016 Jul 27;17:523. doi: 10.1186/s12864-016-2923-8.

Comparison of different assembly and annotation tools on analysis of simulated viral metagenomic communities in the gut.

BMC Genomics. 2014 Jan 18;15:37. doi: 10.1186/1471-2164-15-37.

Assembly and annotation of a non-model gastropod (Nerita melanotragus) transcriptome: a comparison of de novo assemblers.

BMC Res Notes. 2014 Aug 1;7:488. doi: 10.1186/1756-0500-7-488.

Evaluation of short read metagenomic assembly.

BMC Genomics. 2011;12 Suppl 2(Suppl 2):S8. doi: 10.1186/1471-2164-12-S2-S8. Epub 2011 Jul 27.

引用本文的文献

Development of transcriptome assembly and SSRs in allohexaploid Brassica with functional annotations and identification of heat-shock proteins for thermotolerance.

Front Genet. 2022 Sep 16;13:958217. doi: 10.3389/fgene.2022.958217. eCollection 2022.

Raw transcriptomics data to gene specific SSRs: a validated free bioinformatics workflow for biologists.

Sci Rep. 2020 Oct 26;10(1):18236. doi: 10.1038/s41598-020-75270-8.

Distinct Gut Virome Profile of Pregnant Women With Type 1 Diabetes in the ENDIA Study.

Open Forum Infect Dis. 2019 Jan 16;6(2):ofz025. doi: 10.1093/ofid/ofz025. eCollection 2019 Feb.

High-Throughput Sequencing to Investigate Phytopathogenic Fungal Propagules Caught in Baited Insect Traps.

J Fungi (Basel). 2019 Feb 12;5(1):15. doi: 10.3390/jof5010015.

De novo transcriptome sequencing and assembly from apomictic and sexual Eragrostis curvula genotypes.

PLoS One. 2017 Nov 1;12(11):e0185595. doi: 10.1371/journal.pone.0185595. eCollection 2017.

Genomic and transcriptomic analyses reveal distinct biological functions for cold shock proteins (VpaCspA and VpaCspD) in Vibrio parahaemolyticus CHN25 during low-temperature survival.

BMC Genomics. 2017 Jun 5;18(1):436. doi: 10.1186/s12864-017-3784-5.

Exploring the heat-responsive chaperones and microsatellite markers associated with terminal heat stress tolerance in developing wheat.

Funct Integr Genomics. 2017 Nov;17(6):621-640. doi: 10.1007/s10142-017-0560-1. Epub 2017 Jun 1.

Improving transcriptome de novo assembly by using a reference genome of a related species: Translational genomics from oil palm to coconut.

PLoS One. 2017 Mar 23;12(3):e0173300. doi: 10.1371/journal.pone.0173300. eCollection 2017.

Comparative immunogenomics of molluscs.

Dev Comp Immunol. 2017 Oct;75:3-15. doi: 10.1016/j.dci.2017.03.013. Epub 2017 Mar 18.

Transcriptome analysis of reveals candidate genes involved in important secondary metabolic pathways of phenylpropanoids and flavonoids.

PeerJ. 2017 Feb 28;5:e2938. doi: 10.7717/peerj.2938. eCollection 2017.

本文引用的文献

Comprehensive transcriptome analysis of the highly complex Pisum sativum genome using next generation sequencing.

BMC Genomics. 2011 May 11;12:227. doi: 10.1186/1471-2164-12-227.

Transcriptome characterization and high throughput SSRs and SNPs discovery in Cucurbita pepo (Cucurbitaceae).

BMC Genomics. 2011 Feb 10;12:104. doi: 10.1186/1471-2164-12-104.

De novo characterization of the gametophyte transcriptome in bracken fern, Pteridium aquilinum.

BMC Genomics. 2011 Feb 8;12:99. doi: 10.1186/1471-2164-12-99.

Transcriptomics of the bed bug (Cimex lectularius).

PLoS One. 2011 Jan 19;6(1):e16336. doi: 10.1371/journal.pone.0016336.

Population transcriptomics of Drosophila melanogaster females.

BMC Genomics. 2011 Jan 28;12:81. doi: 10.1186/1471-2164-12-81.

The maternal and early embryonic transcriptome of the milkweed bug Oncopeltus fasciatus.

BMC Genomics. 2011 Jan 25;12:61. doi: 10.1186/1471-2164-12-61.

Pyrosequencing the transcriptome of the greenhouse whitefly, Trialeurodes vaporariorum reveals multiple transcripts encoding insecticide targets and detoxifying enzymes.

BMC Genomics. 2011 Jan 24;12:56. doi: 10.1186/1471-2164-12-56.

A high-throughput venom-gland transcriptome for the Eastern Diamondback Rattlesnake (Crotalus adamanteus) and evidence for pervasive positive selection across toxin classes.

Toxicon. 2011 Apr;57(5):657-71. doi: 10.1016/j.toxicon.2011.01.008. Epub 2011 Jan 19.

Transcriptional plasticity of a soil arthropod across different ecological conditions.

Mol Ecol. 2011 Mar;20(6):1144-54. doi: 10.1111/j.1365-294X.2010.04985.x. Epub 2011 Jan 22.

Antarctic krill 454 pyrosequencing reveals chaperone and stress transcriptome.

PLoS One. 2011 Jan 6;6(1):e15919. doi: 10.1371/journal.pone.0015919.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于 454 转录组数据评估从头组装软件的特性：一种模拟方法。

Evaluating characteristics of de novo assembly software on 454 transcriptome data: a simulation approach.

机构信息

出版信息

BACKGROUND

FINDINGS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献