• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

比较 454 转录组数据从头组装程序。

Comparing de novo assemblers for 454 transcriptome data.

机构信息

Institute of Evolutionary Biology, University of Edinburgh, West Mains Road, Edinburgh EH9 3JT, UK.

出版信息

BMC Genomics. 2010 Oct 16;11:571. doi: 10.1186/1471-2164-11-571.

DOI:10.1186/1471-2164-11-571
PMID:20950480
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3091720/
Abstract

BACKGROUND

Roche 454 pyrosequencing has become a method of choice for generating transcriptome data from non-model organisms. Once the tens to hundreds of thousands of short (250-450 base) reads have been produced, it is important to correctly assemble these to estimate the sequence of all the transcripts. Most transcriptome assembly projects use only one program for assembling 454 pyrosequencing reads, but there is no evidence that the programs used to date are optimal. We have carried out a systematic comparison of five assemblers (CAP3, MIRA, Newbler, SeqMan and CLC) to establish best practices for transcriptome assemblies, using a new dataset from the parasitic nematode Litomosoides sigmodontis.

RESULTS

Although no single assembler performed best on all our criteria, Newbler 2.5 gave longer contigs, better alignments to some reference sequences, and was fast and easy to use. SeqMan assemblies performed best on the criterion of recapitulating known transcripts, and had more novel sequence than the other assemblers, but generated an excess of small, redundant contigs. The remaining assemblers all performed almost as well, with the exception of Newbler 2.3 (the version currently used by most assembly projects), which generated assemblies that had significantly lower total length. As different assemblers use different underlying algorithms to generate contigs, we also explored merging of assemblies and found that the merged datasets not only aligned better to reference sequences than individual assemblies, but were also more consistent in the number and size of contigs.

CONCLUSIONS

Transcriptome assemblies are smaller than genome assemblies and thus should be more computationally tractable, but are often harder because individual contigs can have highly variable read coverage. Comparing single assemblers, Newbler 2.5 performed best on our trial data set, but other assemblers were closely comparable. Combining differently optimal assemblies from different programs however gave a more credible final product, and this strategy is recommended.

摘要

背景

罗氏 454 焦磷酸测序已成为从非模式生物中生成转录组数据的首选方法。一旦产生了数万到数十万条短(250-450 个碱基)的读取序列,正确组装这些序列以估计所有转录本的序列就变得非常重要。大多数转录组组装项目仅使用一个程序来组装 454 焦磷酸测序读取序列,但迄今为止没有证据表明使用的程序是最优的。我们使用来自寄生线虫旋毛虫的新数据集,对五个组装程序(CAP3、MIRA、Newbler、SeqMan 和 CLC)进行了系统比较,以确定转录组组装的最佳实践。

结果

尽管没有一个程序在我们的所有标准上都表现最好,但 Newbler 2.5 生成的序列更长,与一些参考序列的比对更好,并且使用快速且简单。SeqMan 组装在重现已知转录本的标准上表现最好,并且比其他组装程序具有更多的新序列,但生成的小冗余序列过多。其余的组装程序表现几乎相同,除了 Newbler 2.3(目前大多数组装项目使用的版本),它生成的组装序列总长度明显较低。由于不同的组装程序使用不同的底层算法来生成序列,我们还探索了组装的合并,发现合并后的数据集不仅比对参考序列的对齐更好,而且在序列的数量和大小上也更一致。

结论

转录组组装比基因组组装小,因此应该更容易进行计算,但通常更难,因为单个序列可能具有高度可变的读取覆盖度。在我们的试验数据集上比较单个组装程序时,Newbler 2.5 表现最佳,但其他组装程序也非常接近。然而,从不同的程序中组合不同的最佳组装程序会得到更可信的最终产品,因此推荐使用这种策略。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0bd/3091720/9d6795793252/1471-2164-11-571-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0bd/3091720/3d7e70649851/1471-2164-11-571-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0bd/3091720/9d6795793252/1471-2164-11-571-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0bd/3091720/3d7e70649851/1471-2164-11-571-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0bd/3091720/9d6795793252/1471-2164-11-571-2.jpg

相似文献

1
Comparing de novo assemblers for 454 transcriptome data.比较 454 转录组数据从头组装程序。
BMC Genomics. 2010 Oct 16;11:571. doi: 10.1186/1471-2164-11-571.
2
Evaluating characteristics of de novo assembly software on 454 transcriptome data: a simulation approach.基于 454 转录组数据评估从头组装软件的特性:一种模拟方法。
PLoS One. 2012;7(2):e31410. doi: 10.1371/journal.pone.0031410. Epub 2012 Feb 27.
3
Short read Illumina data for the de novo assembly of a non-model snail species transcriptome (Radix balthica, Basommatophora, Pulmonata), and a comparison of assembler performance.Illumina 短读测序数据用于从头组装非模式蜗牛物种转录组(Radix balthica,Basommatophora,Pulmonata),并比较组装器性能。
BMC Genomics. 2011 Jun 16;12:317. doi: 10.1186/1471-2164-12-317.
4
Fragmentation and Coverage Variation in Viral Metagenome Assemblies, and Their Effect in Diversity Calculations.病毒宏基因组组装中的碎片化和覆盖度变化,及其对多样性计算的影响。
Front Bioeng Biotechnol. 2015 Sep 17;3:141. doi: 10.3389/fbioe.2015.00141. eCollection 2015.
5
iAssembler: a package for de novo assembly of Roche-454/Sanger transcriptome sequences.iAssembler:用于 Roche-454/Sanger 转录组序列从头组装的软件包。
BMC Bioinformatics. 2011 Nov 23;12:453. doi: 10.1186/1471-2105-12-453.
6
De novo transcriptome assembly for a non-model species, the blood-sucking bug Triatoma brasiliensis, a vector of Chagas disease.对非模式物种巴西吸血蝽(恰加斯病的传播媒介)进行从头转录组组装。
Genetica. 2015 Apr;143(2):225-39. doi: 10.1007/s10709-014-9790-5. Epub 2014 Sep 19.
7
Assembly and annotation of a non-model gastropod (Nerita melanotragus) transcriptome: a comparison of de novo assemblers.一种非模式腹足动物(黑唇蜒螺)转录组的组装与注释:从头组装器的比较
BMC Res Notes. 2014 Aug 1;7:488. doi: 10.1186/1756-0500-7-488.
8
GAM-NGS: genomic assemblies merger for next generation sequencing.GAM-NGS:用于下一代测序的基因组组装合并。
BMC Bioinformatics. 2013;14 Suppl 7(Suppl 7):S6. doi: 10.1186/1471-2105-14-S7-S6. Epub 2013 Apr 22.
9
Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches.用于纳米孔数据的从头组装算法基准测试揭示了重叠布局一致(OLC)方法的最佳性能。
BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):507. doi: 10.1186/s12864-016-2895-8.
10
Comparisons of de novo transcriptome assemblers in diploid and polyploid species using peanut (Arachis spp.) RNA-Seq data.利用花生(落花生属)RNA-Seq数据对二倍体和多倍体物种中的从头转录组组装软件进行比较。
PLoS One. 2014 Dec 31;9(12):e115055. doi: 10.1371/journal.pone.0115055. eCollection 2014.

引用本文的文献

1
Multi-Omics Analysis Decodes Biosynthesis of Specialized Metabolites Constituting the Therapeutic Terrains of .多组学分析解码构成治疗领域的特殊代谢产物的生物合成。
Int J Mol Sci. 2025 Jan 26;26(3):1068. doi: 10.3390/ijms26031068.
2
Insights into the global freshwater virome.对全球淡水病毒群落的洞察。
Front Microbiol. 2022 Sep 28;13:953500. doi: 10.3389/fmicb.2022.953500. eCollection 2022.
3
Metatranscriptomic Analysis of Bacterial Communities on Laundered Textiles: A Pilot Case Study.洗涤后纺织品上细菌群落的宏转录组分析:一项初步案例研究。

本文引用的文献

1
Rapid transcriptome and proteome profiling of a non-model marine invertebrate, Bugula neritina.快速分析无模型海洋无脊椎动物贻贝的转录组和蛋白质组图谱。
Proteomics. 2010 Aug;10(16):2972-81. doi: 10.1002/pmic.201000056.
2
Insights into shell deposition in the Antarctic bivalve Laternula elliptica: gene discovery in the mantle transcriptome using 454 pyrosequencing.在南极双壳类贻贝 Laternula elliptica 中贝壳沉积的研究:使用 454 焦磷酸测序技术在套膜转录组中发现基因。
BMC Genomics. 2010 Jun 8;11:362. doi: 10.1186/1471-2164-11-362.
3
Transcriptome sequencing and comparative transcriptome analysis of the scleroglucan producer Sclerotium rolfsii.
Microorganisms. 2021 Jul 26;9(8):1591. doi: 10.3390/microorganisms9081591.
4
Revised Draft Genome Sequences of Rhodomicrobium vannielii ATCC 17100 and Rhodomicrobium udaipurense JA643.万氏红微菌ATCC 17100和乌代布尔红微菌JA643的修订基因组序列草案
Microbiol Resour Announc. 2021 Apr 1;10(13):e00022-21. doi: 10.1128/MRA.00022-21.
5
An unusual type I ribosome-inactivating protein from Agrostemma githago L.大花飞燕草中的一种不寻常的 I 型核糖体失活蛋白
Sci Rep. 2020 Sep 21;10(1):15377. doi: 10.1038/s41598-020-72282-2.
6
Transcriptome Analysis of Maternal Gene Transcripts in Unfertilized Eggs of and Identification of Immune-Related Maternal Genes.转录组分析在未受精卵中的母体基因转录本和鉴定免疫相关的母体基因。
Int J Mol Sci. 2020 May 29;21(11):3872. doi: 10.3390/ijms21113872.
7
De novo transcriptome assembly: A comprehensive cross-species comparison of short-read RNA-Seq assemblers.从头转录组组装:短读 RNA-Seq 组装器的全面跨物种比较。
Gigascience. 2019 May 1;8(5). doi: 10.1093/gigascience/giz039.
8
Comparative analysis of the transcriptome of the Amazonian fish species Colossoma macropomum (tambaqui) and hybrid tambacu by next generation sequencing.通过下一代测序对亚马逊鱼类巨臀脂鲤(淡水鳕鱼)和杂交淡水鳕鱼转录组进行比较分析。
PLoS One. 2019 Feb 25;14(2):e0212755. doi: 10.1371/journal.pone.0212755. eCollection 2019.
9
Metaproteomics reveals persistent and phylum-redundant metabolic functional stability in adult human gut microbiomes of Crohn's remission patients despite temporal variations in microbial taxa, genomes, and proteomes.代谢蛋白质组学揭示了克罗恩病缓解患者成人肠道微生物组中持久且门类冗余的代谢功能稳定性,尽管微生物类群、基因组和蛋白质组存在时间变化。
Microbiome. 2019 Feb 11;7(1):18. doi: 10.1186/s40168-019-0631-8.
10
Comparative De Novo Transcriptome Assembly of RNA-seq Data using Two Commercial Software Programs.使用两个商业软件程序对RNA-seq数据进行比较性从头转录组组装
Calif J Health Promot. 2018 Jun;16(1):46-53. doi: 10.32398/cjhp_20181601.
转录组测序和产胶葡聚糖菌 Sclerotium rolfsii 的比较转录组分析。
BMC Genomics. 2010 May 26;11:329. doi: 10.1186/1471-2164-11-329.
4
Uncovering the evolutionary origin of plant molecular processes: comparison of Coleochaete (Coleochaetales) and Spirogyra (Zygnematales) transcriptomes.揭示植物分子过程的进化起源:Coleochaete(Coleochaetales)和 Spirogyra(Zygnematales)转录组的比较。
BMC Plant Biol. 2010 May 25;10:96. doi: 10.1186/1471-2229-10-96.
5
Massively parallel pyrosequencing-based transcriptome analyses of small brown planthopper (Laodelphax striatellus), a vector insect transmitting rice stripe virus (RSV).基于大规模平行焦磷酸测序的小褐飞虱(Laodelphax striatellus)转录组分析,小褐飞虱是一种传播水稻条纹病毒(RSV)的媒介昆虫。
BMC Genomics. 2010 May 13;11:303. doi: 10.1186/1471-2164-11-303.
6
Differences in transcription between free-living and CO2-activated third-stage larvae of Haemonchus contortus.自由生活和二氧化碳激活的捻转血矛线虫三龄幼虫转录差异。
BMC Genomics. 2010 Apr 27;11:266. doi: 10.1186/1471-2164-11-266.
7
De novo sequencing and analysis of the American ginseng root transcriptome using a GS FLX Titanium platform to discover putative genes involved in ginsenoside biosynthesis.利用 GS FLX Titanium 平台对西洋参根转录组进行从头测序和分析,以发现参与人参皂苷生物合成的假定基因。
BMC Genomics. 2010 Apr 24;11:262. doi: 10.1186/1471-2164-11-262.
8
Characterization of a hotspot for mimicry: assembly of a butterfly wing transcriptome to genomic sequence at the HmYb/Sb locus.拟态热点的特征:在 HmYb/Sb 基因座处将蝴蝶翅膀转录组组装到基因组序列上。
Mol Ecol. 2010 Mar;19 Suppl 1:240-54. doi: 10.1111/j.1365-294X.2009.04475.x.
9
Transcriptome sequencing in an ecologically important tree species: assembly, annotation, and marker discovery.转录组测序在一个具有重要生态学意义的树种中的应用:组装、注释和标记发现。
BMC Genomics. 2010 Mar 16;11:180. doi: 10.1186/1471-2164-11-180.
10
Development of a EST dataset and characterization of EST-SSRs in a traditional Chinese medicinal plant, Epimedium sagittatum (Sieb. Et Zucc.) Maxim.开发一个 EST 数据集,并对传统中药植物箭叶淫羊藿(Sieb. Et Zucc.)Maxim.中的 EST-SSR 进行特征描述。
BMC Genomics. 2010 Feb 8;11:94. doi: 10.1186/1471-2164-11-94.