优化从头转录组组装从高通量短读测序数据提高非模式生物的功能注释。

Optimization of de novo transcriptome assembly from high-throughput short read sequencing data improves functional annotation for non-model organisms.

机构信息

Department of Chemical and Environmental Engineering, Yale University, New Haven, CT 06511, USA.

出版信息

BMC Bioinformatics. 2012 Jul 18;13:170. doi: 10.1186/1471-2105-13-170.

DOI:10.1186/1471-2105-13-170

PMID:22808927

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3489510/

Abstract

BACKGROUND

The k-mer hash length is a key factor affecting the output of de novo transcriptome assembly packages using de Bruijn graph algorithms. Assemblies constructed with varying single k-mer choices might result in the loss of unique contiguous sequences (contigs) and relevant biological information. A common solution to this problem is the clustering of single k-mer assemblies. Even though annotation is one of the primary goals of a transcriptome assembly, the success of assembly strategies does not consider the impact of k-mer selection on the annotation output. This study provides an in-depth k-mer selection analysis that is focused on the degree of functional annotation achieved for a non-model organism where no reference genome information is available. Individual k-mers and clustered assemblies (CA) were considered using three representative software packages. Pair-wise comparison analyses (between individual k-mers and CAs) were produced to reveal missing Kyoto Encyclopedia of Genes and Genomes (KEGG) ortholog identifiers (KOIs), and to determine a strategy that maximizes the recovery of biological information in a de novo transcriptome assembly.

RESULTS

Analyses of single k-mer assemblies resulted in the generation of various quantities of contigs and functional annotations within the selection window of k-mers (k-19 to k-63). For each k-mer in this window, generated assemblies contained certain unique contigs and KOIs that were not present in the other k-mer assemblies. Producing a non-redundant CA of k-mers 19 to 63 resulted in a more complete functional annotation than any single k-mer assembly. However, a fraction of unique annotations remained (~0.19 to 0.27% of total KOIs) in the assemblies of individual k-mers (k-19 to k-63) that were not present in the non-redundant CA. A workflow to recover these unique annotations is presented.

CONCLUSIONS

This study demonstrated that different k-mer choices result in various quantities of unique contigs per single k-mer assembly which affects biological information that is retrievable from the transcriptome. This undesirable effect can be minimized, but not eliminated, with clustering of multi-k assemblies with redundancy removal. The complete extraction of biological information in de novo transcriptomics studies requires both the production of a CA and efforts to identify unique contigs that are present in individual k-mer assemblies but not in the CA.

摘要

背景

在使用 de Bruijn 图算法的从头转录组组装包中，k-mer 哈希长度是影响输出的关键因素。使用不同的单个 k-mer 选择构建的组装可能会导致独特的连续序列（contigs）和相关的生物信息丢失。解决此问题的常用方法是对单个 k-mer 组装进行聚类。尽管注释是转录组组装的主要目标之一，但组装策略的成功与否并未考虑 k-mer 选择对注释输出的影响。本研究提供了一种深入的 k-mer 选择分析，重点关注在没有参考基因组信息的情况下，针对非模式生物达到的功能注释程度。使用三个有代表性的软件包考虑了单个 k-mer 和聚类组装（CA）。产生了成对比较分析（在单个 k-mer 和 CA 之间），以揭示缺少京都基因与基因组百科全书（KEGG）直系同源标识符（KOI），并确定一种策略，使从头转录组组装中生物信息的恢复最大化。

结果

单个 k-mer 组装的分析导致在 k-mer（k-19 到 k-63）选择窗口内生成了各种数量的 contigs 和功能注释。在该窗口中的每个 k-mer 中，生成的组装都包含某些在其他 k-mer 组装中不存在的独特 contigs 和 KOI。生成 k-19 到 63 的非冗余 k-mer CA 导致的功能注释比任何单个 k-mer 组装都更完整。然而，在不在非冗余 CA 中的单个 k-mer（k-19 到 k-63）组装中仍然存在一些独特的注释（占总 KOI 的 0.19 到 0.27%）。提出了一种恢复这些独特注释的工作流程。

结论

本研究表明，不同的 k-mer 选择会导致每个单个 k-mer 组装中独特 contigs 的数量不同，从而影响可从转录组中检索到的生物信息。通过去除冗余的多 k 组装聚类，可以最小化但不能消除这种不良影响。在从头转录组学研究中提取完整的生物信息既需要生成 CA，又需要努力识别存在于单个 k-mer 组装中但不存在于 CA 中的独特 contigs。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ba6/3489510/bfb8a64c9a85/1471-2105-13-170-2.jpg

相似文献

Optimization of de novo transcriptome assembly from high-throughput short read sequencing data improves functional annotation for non-model organisms.优化从头转录组组装从高通量短读测序数据提高非模式生物的功能注释。

BMC Bioinformatics. 2012 Jul 18;13:170. doi: 10.1186/1471-2105-13-170.

Inferring bona fide transfrags in RNA-Seq derived-transcriptome assemblies of non-model organisms.在非模式生物的RNA测序衍生转录组组装中推断真正的转录片段

BMC Bioinformatics. 2015 Feb 21;16(1):58. doi: 10.1186/s12859-015-0492-5.

Comparison of De Novo Transcriptome Assemblers and k-mer Strategies Using the Killifish, Fundulus heteroclitus.使用底鳉（Fundulus heteroclitus）对从头转录组组装工具和k-mer策略进行比较

PLoS One. 2016 Apr 7;11(4):e0153104. doi: 10.1371/journal.pone.0153104. eCollection 2016.

Assembly and annotation of a non-model gastropod (Nerita melanotragus) transcriptome: a comparison of de novo assemblers.一种非模式腹足动物（黑唇蜒螺）转录组的组装与注释：从头组装器的比较

BMC Res Notes. 2014 Aug 1;7:488. doi: 10.1186/1756-0500-7-488.

Comprehensive evaluation of de novo transcriptome assembly programs and their effects on differential gene expression analysis.从头转录组组装程序的综合评估及其对差异基因表达分析的影响。

Bioinformatics. 2017 Feb 1;33(3):327-333. doi: 10.1093/bioinformatics/btw625.

Challenges and advances for transcriptome assembly in non-model species.非模式物种转录组组装面临的挑战与进展

PLoS One. 2017 Sep 20;12(9):e0185020. doi: 10.1371/journal.pone.0185020. eCollection 2017.

Grouper: graph-based clustering and annotation for improved de novo transcriptome analysis.Grouper：基于图的聚类和注释，用于改进从头转录组分析。

Bioinformatics. 2018 Oct 1;34(19):3265-3272. doi: 10.1093/bioinformatics/bty378.

The complex task of choosing a de novo assembly: lessons from fungal genomes.选择从头组装的复杂任务：来自真菌基因组的经验教训。

Comput Biol Chem. 2014 Dec;53 Pt A:97-107. doi: 10.1016/j.compbiolchem.2014.08.014. Epub 2014 Aug 29.

Evaluation of short read metagenomic assembly.短读宏基因组组装评估。

BMC Genomics. 2011;12 Suppl 2(Suppl 2):S8. doi: 10.1186/1471-2164-12-S2-S8. Epub 2011 Jul 27.

PARRoT- a homology-based strategy to quantify and compare RNA-sequencing from non-model organisms.PARRoT——一种基于同源性的策略，用于量化和比较非模式生物的RNA测序。

BMC Bioinformatics. 2016 Dec 22;17(Suppl 19):513. doi: 10.1186/s12859-016-1366-1.

引用本文的文献

Transcriptome assembly for a colour-polymorphic grasshopper (Gomphocerus sibiricus) with a very large genome size.转录组组装用于研究具有非常大基因组大小的多色蝗虫（Gomphocerus sibiricus）。

BMC Genomics. 2019 May 14;20(1):370. doi: 10.1186/s12864-019-5756-4.

Comparative analysis of the root and leaf transcriptomes in Chelidonium majus L.比较分析白屈菜的根和叶转录组。

PLoS One. 2019 Apr 15;14(4):e0215165. doi: 10.1371/journal.pone.0215165. eCollection 2019.

The transcriptome database, SuperbaSE: An online, open resource for researchers.转录组数据库SuperbaSE：面向研究人员的在线开放资源。

Ecol Evol. 2017 Jun 28;7(16):6060-6077. doi: 10.1002/ece3.3168. eCollection 2017 Aug.

A genome-wide transcriptome map of pistachio (Pistacia vera L.) provides novel insights into salinity-related genes and marker discovery.阿月浑子（黄连木）的全基因组转录图谱为盐相关基因和标记发现提供了新见解。

BMC Genomics. 2017 Aug 17;18(1):627. doi: 10.1186/s12864-017-3989-7.

Informed kmer selection for de novo transcriptome assembly.用于从头转录组组装的信息性k-mer选择

Bioinformatics. 2016 Jun 1;32(11):1670-7. doi: 10.1093/bioinformatics/btw217. Epub 2016 Apr 28.

PLoS One. 2016 Apr 7;11(4):e0153104. doi: 10.1371/journal.pone.0153104. eCollection 2016.

De Novo Assembled Wheat Transcriptomes Delineate Differentially Expressed Host Genes in Response to Leaf Rust Infection.从头组装的小麦转录组描绘了响应叶锈病感染的差异表达宿主基因。

PLoS One. 2016 Feb 3;11(2):e0148453. doi: 10.1371/journal.pone.0148453. eCollection 2016.

Comparative transcriptome analysis of lufenuron-resistant and susceptible strains of Spodoptera frugiperda (Lepidoptera: Noctuidae).氟虫脲抗性和敏感草地贪夜蛾品系的比较转录组分析（鳞翅目：夜蛾科）

BMC Genomics. 2015 Nov 21;16:985. doi: 10.1186/s12864-015-2183-z.

Inferring bona fide transfrags in RNA-Seq derived-transcriptome assemblies of non-model organisms.在非模式生物的RNA测序衍生转录组组装中推断真正的转录片段

BMC Bioinformatics. 2015 Feb 21;16(1):58. doi: 10.1186/s12859-015-0492-5.

Global insights into high temperature and drought stress regulated genes by RNA-Seq in economically important oilseed crop Brassica juncea.通过RNA测序对重要经济油料作物芥菜中高温和干旱胁迫调控基因的全球洞察。

BMC Plant Biol. 2015 Jan 21;15:9. doi: 10.1186/s12870-014-0405-1.

本文引用的文献

Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels.绿洲：跨越表达水平动态范围的稳健从头 RNA-seq 组装。

Bioinformatics. 2012 Apr 15;28(8):1086-92. doi: 10.1093/bioinformatics/bts094. Epub 2012 Feb 24.

Next-generation transcriptome assembly.下一代转录组组装。

Nat Rev Genet. 2011 Sep 7;12(10):671-82. doi: 10.1038/nrg3068.

RNA-Seq unleashed.RNA测序大放异彩。

Nat Biotechnol. 2011 Jul 11;29(7):599-600. doi: 10.1038/nbt.1915.

Short read Illumina data for the de novo assembly of a non-model snail species transcriptome (Radix balthica, Basommatophora, Pulmonata), and a comparison of assembler performance.Illumina 短读测序数据用于从头组装非模式蜗牛物种转录组（Radix balthica，Basommatophora，Pulmonata），并比较组装器性能。

BMC Genomics. 2011 Jun 16;12:317. doi: 10.1186/1471-2164-12-317.

Full-length transcriptome assembly from RNA-Seq data without a reference genome.无参考基因组的 RNA-Seq 数据的全长转录组组装。

Nat Biotechnol. 2011 May 15;29(7):644-52. doi: 10.1038/nbt.1883.

Comparing de novo genome assembly: the long and short of it.从头开始比较基因组组装：长与短。

PLoS One. 2011 Apr 29;6(4):e19175. doi: 10.1371/journal.pone.0019175.

Evaluation of next-generation sequencing software in mapping and assembly.下一代测序软件在图谱绘制和组装方面的评估。

J Hum Genet. 2011 Jun;56(6):406-14. doi: 10.1038/jhg.2011.43. Epub 2011 Apr 28.

A practical comparison of de novo genome assembly software tools for next-generation sequencing technologies.新一代测序技术中从头基因组组装软件工具的实用比较。

PLoS One. 2011 Mar 14;6(3):e17915. doi: 10.1371/journal.pone.0017915.

De novo assembly of chickpea transcriptome using short reads for gene discovery and marker identification.利用短读长进行 chickpea 转录组从头组装，以进行基因发现和标记鉴定。

DNA Res. 2011 Feb;18(1):53-63. doi: 10.1093/dnares/dsq028. Epub 2011 Jan 7.

De novo assembly and analysis of RNA-seq data.从头组装和分析 RNA-seq 数据。

Nat Methods. 2010 Nov;7(11):909-12. doi: 10.1038/nmeth.1517. Epub 2010 Oct 10.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

优化从头转录组组装从高通量短读测序数据提高非模式生物的功能注释。

Optimization of de novo transcriptome assembly from high-throughput short read sequencing data improves functional annotation for non-model organisms.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献