从头转录组组装的优化。

Optimization of de novo transcriptome assembly from next-generation sequencing data.

机构信息

Department of Zoology and Animal Biology, University of Geneva, 1211 Geneva 4, Switzerland.

出版信息

Genome Res. 2010 Oct;20(10):1432-40. doi: 10.1101/gr.103846.109. Epub 2010 Aug 6.

DOI:10.1101/gr.103846.109

PMID:20693479

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2945192/

Abstract

Transcriptome analysis has important applications in many biological fields. However, assembling a transcriptome without a known reference remains a challenging task requiring algorithmic improvements. We present two methods for substantially improving transcriptome de novo assembly. The first method relies on the observation that the use of a single k-mer length by current de novo assemblers is suboptimal to assemble transcriptomes where the sequence coverage of transcripts is highly heterogeneous. We present the Multiple-k method in which various k-mer lengths are used for de novo transcriptome assembly. We demonstrate its good performance by assembling de novo a published next-generation transcriptome sequence data set of Aedes aegypti, using the existing genome to check the accuracy of our method. The second method relies on the use of a reference proteome to improve the de novo assembly. We developed the Scaffolding using Translation Mapping (STM) method that uses mapping against the closest available reference proteome for scaffolding contigs that map onto the same protein. In a controlled experiment using simulated data, we show that the STM method considerably improves the assembly, with few errors. We applied these two methods to assemble the transcriptome of the non-model catfish Loricaria gr. cataphracta. Using the Multiple-k and STM methods, the assembly increases in contiguity and in gene identification, showing that our methods clearly improve quality and can be widely used. The new methods were used to assemble successfully the transcripts of the core set of genes regulating tooth development in vertebrates, while classic de novo assembly failed.

摘要

转录组分析在许多生物学领域都有重要的应用。然而，在没有已知参考的情况下组装转录组仍然是一项具有挑战性的任务，需要算法的改进。我们提出了两种方法，可以大大提高转录组从头组装的性能。第一种方法依赖于这样一个观察结果，即当前的从头组装程序使用单一的 k-mer 长度对于组装转录组是次优的，因为转录本的序列覆盖度高度不均匀。我们提出了多 k 方法，该方法使用各种 k-mer 长度进行从头转录组组装。我们使用现有的基因组来检查我们方法的准确性，通过组装已发表的埃及伊蚊下一代转录组序列数据集来证明其良好的性能。第二种方法依赖于使用参考蛋白质组来改进从头组装。我们开发了使用翻译映射 (STM) 进行支架构建的方法，该方法使用与最接近的可用参考蛋白质组进行映射，以构建映射到同一蛋白质的支架连续体。在使用模拟数据的受控实验中，我们表明 STM 方法大大提高了组装的准确性，错误很少。我们将这两种方法应用于非模式鲶鱼 Loricaria gr. cataphracta 的转录组组装。使用多 k 和 STM 方法，组装的连续性和基因识别得到了提高，表明我们的方法明显提高了质量，可以广泛应用。这两种新方法成功地组装了脊椎动物牙齿发育核心调控基因的转录本，而经典的从头组装方法则失败了。

相似文献

Optimization of de novo transcriptome assembly from next-generation sequencing data.从头转录组组装的优化。

Genome Res. 2010 Oct;20(10):1432-40. doi: 10.1101/gr.103846.109. Epub 2010 Aug 6.

Benchmarking next-generation transcriptome sequencing for functional and evolutionary genomics.下一代转录组测序在功能和进化基因组学中的基准测试。

Mol Biol Evol. 2009 Dec;26(12):2731-44. doi: 10.1093/molbev/msp188. Epub 2009 Aug 25.

Challenges and advances for transcriptome assembly in non-model species.非模式物种转录组组装面临的挑战与进展

PLoS One. 2017 Sep 20;12(9):e0185020. doi: 10.1371/journal.pone.0185020. eCollection 2017.

Comparative performance of transcriptome assembly methods for non-model organisms.非模式生物转录组组装方法的比较性能

BMC Genomics. 2016 Jul 27;17:523. doi: 10.1186/s12864-016-2923-8.

A pipeline for the de novo assembly of the Themira biloba (Sepsidae: Diptera) transcriptome using a multiple k-mer length approach.使用多 k -mer 长度方法从头组装 Themira biloba（双翅目：Sepsidae）转录组的流水线。

BMC Genomics. 2014 Mar 12;15(1):188. doi: 10.1186/1471-2164-15-188.

Optimization of de novo transcriptome assembly from high-throughput short read sequencing data improves functional annotation for non-model organisms.优化从头转录组组装从高通量短读测序数据提高非模式生物的功能注释。

BMC Bioinformatics. 2012 Jul 18;13:170. doi: 10.1186/1471-2105-13-170.

Short read Illumina data for the de novo assembly of a non-model snail species transcriptome (Radix balthica, Basommatophora, Pulmonata), and a comparison of assembler performance.Illumina 短读测序数据用于从头组装非模式蜗牛物种转录组（Radix balthica，Basommatophora，Pulmonata），并比较组装器性能。

BMC Genomics. 2011 Jun 16;12:317. doi: 10.1186/1471-2164-12-317.

Comparison of De Novo Transcriptome Assemblers and k-mer Strategies Using the Killifish, Fundulus heteroclitus.使用底鳉（Fundulus heteroclitus）对从头转录组组装工具和k-mer策略进行比较

PLoS One. 2016 Apr 7;11(4):e0153104. doi: 10.1371/journal.pone.0153104. eCollection 2016.

Comparing de novo assemblers for 454 transcriptome data.比较 454 转录组数据从头组装程序。

BMC Genomics. 2010 Oct 16;11:571. doi: 10.1186/1471-2164-11-571.

Efficient assembly and annotation of the transcriptome of catfish by RNA-Seq analysis of a doubled haploid homozygote.通过对双倍单倍体纯合子的 RNA-Seq 分析，高效组装和注释鲶鱼转录组。

BMC Genomics. 2012 Nov 5;13:595. doi: 10.1186/1471-2164-13-595.

引用本文的文献

Correcting errors in PCR-derived libraries for rare allele detection by reconstructing parental and daughter strand information.通过重建亲本链和子链信息来校正用于罕见等位基因检测的PCR衍生文库中的错误。

Commun Biol. 2025 Jul 24;8(1):1098. doi: 10.1038/s42003-025-08537-3.

Classification of and detection techniques for RNAi-induced effects in GM plants.转基因植物中RNA干扰诱导效应的分类及检测技术

Front Plant Sci. 2025 Mar 7;16:1535384. doi: 10.3389/fpls.2025.1535384. eCollection 2025.

Roast: a tool for reference-free optimization of supertranscriptome assemblies.Roast：一种用于无参考超级转录组组装优化的工具。

BMC Bioinformatics. 2024 Jan 2;25(1):2. doi: 10.1186/s12859-023-05614-4.

Metabolomics and transcriptomics analyses for characterizing the alkaloid metabolism of Chinese jujube and sour jujube fruits.用于表征枣和酸枣果实生物碱代谢的代谢组学和转录组学分析。

Front Plant Sci. 2023 Sep 18;14:1267758. doi: 10.3389/fpls.2023.1267758. eCollection 2023.

Optimizing an efficient ensemble approach for high-quality de novo transcriptome assembly of Thymus daenensis.优化高效集成方法，实现高质量的甘肃黄花补血草从头转录组组装。

Sci Rep. 2023 Jul 31;13(1):12415. doi: 10.1038/s41598-023-39620-6.

Elucidating the Mesocarp Drupe Transcriptome of Açai ( Mart.): An Amazonian Tree Palm Producer of Bioactive Compounds.阐明亚马逊树棕榈产生活性化合物的中果皮核果转录组。

Int J Mol Sci. 2023 May 26;24(11):9315. doi: 10.3390/ijms24119315.

Normalized Workflow to Optimize Hybrid De Novo Transcriptome Assembly for Non-Model Species: A Case Study in (Baker) Boiss.用于优化非模式物种混合从头转录组组装的标准化工作流程：以（贝克）布瓦西埃属植物为例

Plants (Basel). 2022 Sep 10;11(18):2365. doi: 10.3390/plants11182365.

Salivary and Intestinal Transcriptomes Reveal Differential Gene Expression in Starving, Fed and -Infected .唾液和肠道转录组揭示了饥饿、进食和感染状态下的差异基因表达。

Front Cell Infect Microbiol. 2021 Dec 17;11:773357. doi: 10.3389/fcimb.2021.773357. eCollection 2021.

Transcriptional expression changes during compensatory plasticity in the terminal ganglion of the adult cricket Gryllus bimaculatus.转录表达变化在成年蟋蟀 Gryllus bimaculatus 终末神经节的代偿性可塑性过程中。

BMC Genomics. 2021 Oct 14;22(1):742. doi: 10.1186/s12864-021-08018-x.

Transcriptome Signature of Immune Cells Post Reovirus Treatment in Mutated Colorectal Cancer.呼肠孤病毒治疗后突变型结直肠癌中免疫细胞的转录组特征

Cancer Manag Res. 2021 Aug 27;13:6743-6754. doi: 10.2147/CMAR.S324203. eCollection 2021.

本文引用的文献

Transcriptome genetics using second generation sequencing in a Caucasian population.基于第二代测序的白种人群转录组遗传学研究。

Nature. 2010 Apr 1;464(7289):773-7. doi: 10.1038/nature08903. Epub 2010 Mar 10.

Transcriptome screen for fast evolving genes by Inter-Specific Selective Hybridization (ISSH).通过种间特异性杂交（ISSH）进行快速进化基因的转录组筛选。

BMC Genomics. 2010 Feb 22;11:126. doi: 10.1186/1471-2164-11-126.

Tissue compartment analysis for biomarker discovery by gene expression profiling.通过基因表达谱进行生物标志物发现的组织隔室分析。

PLoS One. 2009 Nov 10;4(11):e7779. doi: 10.1371/journal.pone.0007779.

Targeted next-generation sequencing of a cancer transcriptome enhances detection of sequence variants and novel fusion transcripts.针对癌症转录组的靶向下一代测序可提高序列变异和新型融合转录本的检测率。

Genome Biol. 2009;10(10):R115. doi: 10.1186/gb-2009-10-10-r115. Epub 2009 Oct 16.

Next-generation sequencing reveals complex relationships between the epigenome and transcriptome in maize.下一代测序揭示了玉米中表观基因组和转录组之间的复杂关系。

Plant Signal Behav. 2009 Aug;4(8):760-2. doi: 10.1105/tpc.109.065714. Epub 2009 Aug 3.

ALLPATHS 2: small genomes assembled accurately and with high continuity from short paired reads.ALLPATHS 2：使用短配对读取准确且高度连续地组装小基因组。

Genome Biol. 2009;10(10):R103. doi: 10.1186/gb-2009-10-10-r103. Epub 2009 Oct 1.

Allele-specific expression assays using Solexa.使用Solexa的等位基因特异性表达分析。

BMC Genomics. 2009 Sep 9;10:422. doi: 10.1186/1471-2164-10-422.

Updates to the RMAP short-read mapping software.RMAP 短读序列比对软件更新。

Bioinformatics. 2009 Nov 1;25(21):2841-2. doi: 10.1093/bioinformatics/btp533. Epub 2009 Sep 7.

Benchmarking next-generation transcriptome sequencing for functional and evolutionary genomics.下一代转录组测序在功能和进化基因组学中的基准测试。

Mol Biol Evol. 2009 Dec;26(12):2731-44. doi: 10.1093/molbev/msp188. Epub 2009 Aug 25.

Deep sequencing of the zebrafish transcriptome response to mycobacterium infection.斑马鱼转录组对分枝杆菌感染反应的深度测序

Mol Immunol. 2009 Sep;46(15):2918-30. doi: 10.1016/j.molimm.2009.07.002. Epub 2009 Jul 24.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验