• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于共识的集成方法提高转录组组装。

A consensus-based ensemble approach to improve transcriptome assembly.

机构信息

School of Biological Sciences, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA.

Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA.

出版信息

BMC Bioinformatics. 2021 Oct 21;22(1):513. doi: 10.1186/s12859-021-04434-8.

DOI:10.1186/s12859-021-04434-8
PMID:34674629
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8532302/
Abstract

BACKGROUND

Systems-level analyses, such as differential gene expression analysis, co-expression analysis, and metabolic pathway reconstruction, depend on the accuracy of the transcriptome. Multiple tools exist to perform transcriptome assembly from RNAseq data. However, assembling high quality transcriptomes is still not a trivial problem. This is especially the case for non-model organisms where adequate reference genomes are often not available. Different methods produce different transcriptome models and there is no easy way to determine which are more accurate. Furthermore, having alternative-splicing events exacerbates such difficult assembly problems. While benchmarking transcriptome assemblies is critical, this is also not trivial due to the general lack of true reference transcriptomes.

RESULTS

In this study, we first provide a pipeline to generate a set of the simulated benchmark transcriptome and corresponding RNAseq data. Using the simulated benchmarking datasets, we compared the performance of various transcriptome assembly approaches including both de novo and genome-guided methods. The results showed that the assembly performance deteriorates significantly when alternative transcripts (isoforms) exist or for genome-guided methods when the reference is not available from the same genome. To improve the transcriptome assembly performance, leveraging the overlapping predictions between different assemblies, we present a new consensus-based ensemble transcriptome assembly approach, ConSemble.

CONCLUSIONS

Without using a reference genome, ConSemble using four de novo assemblers achieved an accuracy up to twice as high as any de novo assemblers we compared. When a reference genome is available, ConSemble using four genome-guided assemblies removed many incorrectly assembled contigs with minimal impact on correctly assembled contigs, achieving higher precision and accuracy than individual genome-guided methods. Furthermore, ConSemble using de novo assemblers matched or exceeded the best performing genome-guided assemblers even when the transcriptomes included isoforms. We thus demonstrated that the ConSemble consensus strategy both for de novo and genome-guided assemblers can improve transcriptome assembly. The RNAseq simulation pipeline, the benchmark transcriptome datasets, and the script to perform the ConSemble assembly are all freely available from: http://bioinfolab.unl.edu/emlab/consemble/ .

摘要

背景

系统水平分析,如差异基因表达分析、共表达分析和代谢途径重建,都依赖于转录组的准确性。有多种工具可用于从 RNAseq 数据中进行转录组组装。然而,组装高质量的转录组仍然不是一个简单的问题。对于没有足够参考基因组的非模式生物来说尤其如此。不同的方法会产生不同的转录组模型,并且没有简单的方法来确定哪个更准确。此外,存在可变剪接事件会加剧这种困难的组装问题。尽管对转录组组装进行基准测试至关重要,但由于普遍缺乏真正的参考转录组,这也不是一件简单的事情。

结果

在这项研究中,我们首先提供了一个生成一组模拟基准转录组和相应 RNAseq 数据的流程。使用模拟的基准数据集,我们比较了各种转录组组装方法的性能,包括从头和基于基因组的方法。结果表明,当存在替代转录本(异构体)时,组装性能会显著恶化,或者对于基于基因组的方法,当无法从同一基因组获得参考时,组装性能也会恶化。为了提高转录组组装性能,我们利用不同组装之间的重叠预测,提出了一种新的基于共识的组合转录组组装方法,即 ConSemble。

结论

在不使用参考基因组的情况下,使用四个从头组装器的 ConSemble 达到了高达我们比较的任何从头组装器两倍的准确性。当有参考基因组可用时,使用四个基于基因组的组装器的 ConSemble 去除了许多错误组装的连续体,而对正确组装的连续体的影响最小,比单个基于基因组的方法具有更高的精度和准确性。此外,即使转录组包含异构体,使用从头组装器的 ConSemble 也能达到或超过表现最好的基于基因组的组装器。因此,我们证明了 ConSemble 共识策略既适用于从头组装器,也适用于基于基因组的组装器,可以改进转录组组装。RNAseq 模拟流程、基准转录组数据集以及执行 ConSemble 组装的脚本均可从以下网址免费获取:http://bioinfolab.unl.edu/emlab/consemble/ 。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f011/8532302/138462a45fec/12859_2021_4434_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f011/8532302/367d0826adf1/12859_2021_4434_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f011/8532302/ebcd353ac73f/12859_2021_4434_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f011/8532302/8bd0d2d91458/12859_2021_4434_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f011/8532302/ef92fdbc13c5/12859_2021_4434_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f011/8532302/138462a45fec/12859_2021_4434_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f011/8532302/367d0826adf1/12859_2021_4434_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f011/8532302/ebcd353ac73f/12859_2021_4434_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f011/8532302/8bd0d2d91458/12859_2021_4434_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f011/8532302/ef92fdbc13c5/12859_2021_4434_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f011/8532302/138462a45fec/12859_2021_4434_Fig5_HTML.jpg

相似文献

1
A consensus-based ensemble approach to improve transcriptome assembly.基于共识的集成方法提高转录组组装。
BMC Bioinformatics. 2021 Oct 21;22(1):513. doi: 10.1186/s12859-021-04434-8.
2
Combining independent de novo assemblies optimizes the coding transcriptome for nonconventional model eukaryotic organisms.结合独立的从头组装可优化非传统模式真核生物的编码转录组。
BMC Bioinformatics. 2016 Dec 9;17(1):525. doi: 10.1186/s12859-016-1406-x.
3
Plant Transcriptome Assembly: Review and Benchmarking植物转录组组装:综述与基准测试
4
Challenges and advances for transcriptome assembly in non-model species.非模式物种转录组组装面临的挑战与进展
PLoS One. 2017 Sep 20;12(9):e0185020. doi: 10.1371/journal.pone.0185020. eCollection 2017.
5
Fragmentation and Coverage Variation in Viral Metagenome Assemblies, and Their Effect in Diversity Calculations.病毒宏基因组组装中的碎片化和覆盖度变化,及其对多样性计算的影响。
Front Bioeng Biotechnol. 2015 Sep 17;3:141. doi: 10.3389/fbioe.2015.00141. eCollection 2015.
6
ClusTrast: a short read de novo transcript isoform assembler guided by clustered contigs.ClusTrast:一种基于聚类 contigs 的短读从头转录本异构体组装工具。
BMC Bioinformatics. 2024 Feb 1;25(1):54. doi: 10.1186/s12859-024-05663-3.
7
Assembly and annotation of a non-model gastropod (Nerita melanotragus) transcriptome: a comparison of de novo assemblers.一种非模式腹足动物(黑唇蜒螺)转录组的组装与注释:从头组装器的比较
BMC Res Notes. 2014 Aug 1;7:488. doi: 10.1186/1756-0500-7-488.
8
Selecting Superior De Novo Transcriptome Assemblies: Lessons Learned by Leveraging the Best Plant Genome.选择优质的从头转录组组装:借鉴最佳植物基因组的经验教训。
PLoS One. 2016 Jan 5;11(1):e0146062. doi: 10.1371/journal.pone.0146062. eCollection 2016.
9
Comparative performance of transcriptome assembly methods for non-model organisms.非模式生物转录组组装方法的比较性能
BMC Genomics. 2016 Jul 27;17:523. doi: 10.1186/s12864-016-2923-8.
10
iAssembler: a package for de novo assembly of Roche-454/Sanger transcriptome sequences.iAssembler:用于 Roche-454/Sanger 转录组序列从头组装的软件包。
BMC Bioinformatics. 2011 Nov 23;12:453. doi: 10.1186/1471-2105-12-453.

引用本文的文献

1
Transcriptome analysis of two isolates of the tomato pathogen Cladosporium fulvum, uncovers genome-wide patterns of alternative splicing during a host infection cycle.番茄病原菌fulvum两个分离株的转录组分析揭示了宿主感染周期中全基因组范围内的可变剪接模式。
PLoS Pathog. 2024 Dec 18;20(12):e1012791. doi: 10.1371/journal.ppat.1012791. eCollection 2024 Dec.
2
Comparative Genomics Uncovers the Evolutionary Dynamics of Detoxification and Insecticide Target Genes Across 11 Phlebotomine Sand Flies.比较基因组学揭示了 11 种白蛉属吸血昆虫解毒和杀虫剂靶基因的进化动态。
Genome Biol Evol. 2024 Sep 3;16(9). doi: 10.1093/gbe/evae186.
3

本文引用的文献

1
TransBorrow: genome-guided transcriptome assembly by borrowing assemblies from different assemblers.TransBorrow:通过从不同的组装器借用组装来进行基因组指导的转录组组装。
Genome Res. 2020 Aug;30(8):1181-1190. doi: 10.1101/gr.257766.119. Epub 2020 Aug 17.
2
Transcriptome assembly from long-read RNA-seq alignments with StringTie2.基于长读 RNA-seq 比对的转录组组装与 StringTie2。
Genome Biol. 2019 Dec 16;20(1):278. doi: 10.1186/s13059-019-1910-1.
3
rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data.
A cloud-based training module for efficient de novo transcriptome assembly using Nextflow and Google cloud.
基于云的训练模块,用于使用 Nextflow 和谷歌云进行高效从头转录组组装。
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae313.
4
De novo assembly of transcriptomes and differential gene expression analysis using short-read data from emerging model organisms - a brief guide.利用新兴模式生物的短读长数据进行转录组的从头组装和差异基因表达分析——简要指南
Front Zool. 2024 Jun 20;21(1):17. doi: 10.1186/s12983-024-00538-y.
5
Normalized Workflow to Optimize Hybrid De Novo Transcriptome Assembly for Non-Model Species: A Case Study in (Baker) Boiss.用于优化非模式物种混合从头转录组组装的标准化工作流程:以(贝克)布瓦西埃属植物为例
Plants (Basel). 2022 Sep 10;11(18):2365. doi: 10.3390/plants11182365.
rnaSPAdes:一种从头转录组组装程序及其在 RNA-Seq 数据中的应用。
Gigascience. 2019 Sep 1;8(9). doi: 10.1093/gigascience/giz100.
4
Next-generation transcriptome assembly and analysis: Impact of ploidy.下一代转录组组装和分析:倍性的影响。
Methods. 2020 Apr 1;176:14-24. doi: 10.1016/j.ymeth.2019.06.001. Epub 2019 Jun 6.
5
De novo transcriptome assembly: A comprehensive cross-species comparison of short-read RNA-Seq assemblers.从头转录组组装:短读 RNA-Seq 组装器的全面跨物种比较。
Gigascience. 2019 May 1;8(5). doi: 10.1093/gigascience/giz039.
6
Genes of the pig, , reconstructed with EvidentialGene.猪的基因,,用EvidentialGene重建。 你提供的原文中逗号之间似乎有信息缺失,导致翻译有些不太连贯。
PeerJ. 2019 Feb 1;7:e6374. doi: 10.7717/peerj.6374. eCollection 2019.
7
Errors in long-read assemblies can critically affect protein prediction.长读长组装中的错误会严重影响蛋白质预测。
Nat Biotechnol. 2019 Feb;37(2):124-126. doi: 10.1038/s41587-018-0004-z.
8
Evaluating the Performance of De Novo Assembly Methods for Venom-Gland Transcriptomics.评估从头组装方法在毒液腺转录组学中的性能。
Toxins (Basel). 2018 Jun 19;10(6):249. doi: 10.3390/toxins10060249.
9
Accurate assembly of transcripts through phase-preserving graph decomposition.通过保留相位的图分解实现转录本的精确组装。
Nat Biotechnol. 2017 Dec;35(12):1167-1169. doi: 10.1038/nbt.4020. Epub 2017 Nov 13.
10
Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance.Edlib:一个使用编辑距离进行快速、精确序列比对的C/C++库。
Bioinformatics. 2017 May 1;33(9):1394-1395. doi: 10.1093/bioinformatics/btw753.