Suppr
超能文献

一种用于下一代测序数据的可扩展且准确的靶向基因组装工具（SAT组装器）。

A scalable and accurate targeted gene assembly tool (SAT-Assembler) for next-generation sequencing data.

作者信息

Zhang Yuan, Sun Yanni, Cole James R

机构信息

Department of Computer Science and Engineering, Michigan State University, East Lansing, Michigan, United States of America.

Center for Microbial Ecology, Michigan State University, East Lansing, Michigan, United States of America.

出版信息

PLoS Comput Biol. 2014 Aug 14;10(8):e1003737. doi: 10.1371/journal.pcbi.1003737. eCollection 2014 Aug.

DOI:10.1371/journal.pcbi.1003737

PMID:25122209

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4133164/

Abstract

Gene assembly, which recovers gene segments from short reads, is an important step in functional analysis of next-generation sequencing data. Lacking quality reference genomes, de novo assembly is commonly used for RNA-Seq data of non-model organisms and metagenomic data. However, heterogeneous sequence coverage caused by heterogeneous expression or species abundance, similarity between isoforms or homologous genes, and large data size all pose challenges to de novo assembly. As a result, existing assembly tools tend to output fragmented contigs or chimeric contigs, or have high memory footprint. In this work, we introduce a targeted gene assembly program SAT-Assembler, which aims to recover gene families of particular interest to biologists. It addresses the above challenges by conducting family-specific homology search, homology-guided overlap graph construction, and careful graph traversal. It can be applied to both RNA-Seq and metagenomic data. Our experimental results on an Arabidopsis RNA-Seq data set and two metagenomic data sets show that SAT-Assembler has smaller memory usage, comparable or better gene coverage, and lower chimera rate for assembling a set of genes from one or multiple pathways compared with other assembly tools. Moreover, the family-specific design and rapid homology search allow SAT-Assembler to be naturally compatible with parallel computing platforms. The source code of SAT-Assembler is available at https://sourceforge.net/projects/sat-assembler/. The data sets and experimental settings can be found in supplementary material.

摘要

基因组装是从短读段中恢复基因片段的过程，是下一代测序数据功能分析中的重要步骤。由于缺乏高质量的参考基因组，从头组装通常用于非模式生物的RNA-Seq数据和宏基因组数据。然而，由异质表达或物种丰度导致的异质序列覆盖、异构体或同源基因之间的相似性以及大数据量都给从头组装带来了挑战。因此，现有的组装工具往往会输出碎片化的重叠群或嵌合重叠群，或者占用大量内存。在这项工作中，我们介绍了一个靶向基因组装程序SAT-Assembler，其目的是恢复生物学家特别感兴趣的基因家族。它通过进行家族特异性同源性搜索、同源性引导的重叠图构建以及仔细的图遍历，解决了上述挑战。它可应用于RNA-Seq数据和宏基因组数据。我们在一个拟南芥RNA-Seq数据集和两个宏基因组数据集上的实验结果表明，与其他组装工具相比，SAT-Assembler在从一个或多个途径组装一组基因时，内存使用量更小，基因覆盖率相当或更好，嵌合率更低。此外，家族特异性设计和快速同源性搜索使SAT-Assembler能够自然地与并行计算平台兼容。SAT-Assembler的源代码可在https://sourceforge.net/projects/sat-assembler/获取。数据集和实验设置可在补充材料中找到。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a6ca/4133164/bbf724463435/pcbi.1003737.g001.jpg

相似文献

A scalable and accurate targeted gene assembly tool (SAT-Assembler) for next-generation sequencing data.

PLoS Comput Biol. 2014 Aug 14;10(8):e1003737. doi: 10.1371/journal.pcbi.1003737. eCollection 2014 Aug.

MegaGTA: a sensitive and accurate metagenomic gene-targeted assembler using iterative de Bruijn graphs.

BMC Bioinformatics. 2017 Oct 16;18(Suppl 12):408. doi: 10.1186/s12859-017-1825-3.

ViraPipe: scalable parallel pipeline for viral metagenome analysis from next generation sequencing reads.

Bioinformatics. 2018 Mar 15;34(6):928-935. doi: 10.1093/bioinformatics/btx702.

InteMAP: Integrated metagenomic assembly pipeline for NGS short reads.

BMC Bioinformatics. 2015 Aug 7;16:244. doi: 10.1186/s12859-015-0686-x.

Fast and simple protein-alignment-guided assembly of orthologous gene families from microbiome sequencing reads.

Microbiome. 2017 Jan 25;5(1):11. doi: 10.1186/s40168-017-0233-2.

GRASP2: fast and memory-efficient gene-centric assembly and homolog search for metagenomic sequencing data.

BMC Bioinformatics. 2019 Jun 6;20(Suppl 11):276. doi: 10.1186/s12859-019-2818-1.

NeatFreq: reference-free data reduction and coverage normalization for De Novo sequence assembly.

BMC Bioinformatics. 2014 Nov 19;15(1):357. doi: 10.1186/s12859-014-0357-3.

Fragmentation and Coverage Variation in Viral Metagenome Assemblies, and Their Effect in Diversity Calculations.

Front Bioeng Biotechnol. 2015 Sep 17;3:141. doi: 10.3389/fbioe.2015.00141. eCollection 2015.

SWAP-Assembler: scalable and efficient genome assembly towards thousands of cores.

BMC Bioinformatics. 2014;15 Suppl 9(Suppl 9):S2. doi: 10.1186/1471-2105-15-S9-S2. Epub 2014 Sep 10.

Improving the sensitivity of long read overlap detection using grouped short k-mer matches.

BMC Genomics. 2019 Apr 4;20(Suppl 2):190. doi: 10.1186/s12864-019-5475-x.

引用本文的文献

Applications of de Bruijn graphs in microbiome research.

Imeta. 2022 Mar 1;1(1):e4. doi: 10.1002/imt2.4. eCollection 2022 Mar.

kakapo: easy extraction and annotation of genes from raw RNA-seq reads.

PeerJ. 2023 Nov 27;11:e16456. doi: 10.7717/peerj.16456. eCollection 2023.

PlantTribes2: Tools for comparative gene family analysis in plant genomics.

Front Plant Sci. 2023 Jan 31;13:1011199. doi: 10.3389/fpls.2022.1011199. eCollection 2022.

Unleashing the power within short-read RNA-seq for plant research: Beyond differential expression analysis and toward regulomics.

Front Plant Sci. 2022 Dec 8;13:1038109. doi: 10.3389/fpls.2022.1038109. eCollection 2022.

Music of metagenomics-a review of its applications, analysis pipeline, and associated tools.

Funct Integr Genomics. 2022 Feb;22(1):3-26. doi: 10.1007/s10142-021-00810-y. Epub 2021 Oct 18.

MCRL: using a reference library to compress a metagenome into a non-redundant list of sequences, considering viruses as a case study.

Bioinformatics. 2022 Jan 12;38(3):631-647. doi: 10.1093/bioinformatics/btab703.

Lactation Associated Genes Revealed in Holstein Dairy Cows by Weighted Gene Co-Expression Network Analysis (WGCNA).

Animals (Basel). 2021 Jan 27;11(2):314. doi: 10.3390/ani11020314.

Quantitative comparison between the rhizosphere effect of Arabidopsis thaliana and co-occurring plant species with a longer life history.

ISME J. 2020 Oct;14(10):2433-2448. doi: 10.1038/s41396-020-0695-2. Epub 2020 Jul 8.

Targeted domain assembly for fast functional profiling of metagenomic datasets with S3A.

Bioinformatics. 2020 Jul 1;36(13):3975-3981. doi: 10.1093/bioinformatics/btaa272.

Review, Evaluation, and Directions for Gene-Targeted Assembly for Ecological Analyses of Metagenomes.

Front Genet. 2019 Oct 15;10:957. doi: 10.3389/fgene.2019.00957. eCollection 2019.

本文引用的文献

EBI metagenomics--a new resource for the analysis and archiving of metagenomic data.

Nucleic Acids Res. 2014 Jan;42(Database issue):D600-6. doi: 10.1093/nar/gkt961. Epub 2013 Oct 27.

FunGene: the functional gene pipeline and repository.

Front Microbiol. 2013 Oct 1;4:291. doi: 10.3389/fmicb.2013.00291. eCollection 2013.

IDBA-tran: a more robust de novo de Bruijn graph assembler for transcriptomes with uneven expression levels.

Bioinformatics. 2013 Jul 1;29(13):i326-34. doi: 10.1093/bioinformatics/btt219.

A Sensitive and Accurate protein domain cLassification Tool (SALT) for short reads.

Bioinformatics. 2013 Sep 1;29(17):2103-11. doi: 10.1093/bioinformatics/btt357. Epub 2013 Jun 19.

Comparative metagenomic and rRNA microbial diversity characterization using archaeal and bacterial synthetic communities.

Environ Microbiol. 2013 Jun;15(6):1882-99. doi: 10.1111/1462-2920.12086. Epub 2013 Feb 6.

MetAMOS: a modular and open source metagenomic assembly and analysis pipeline.

Genome Biol. 2013 Jan 15;14(1):R2. doi: 10.1186/gb-2013-14-1-r2.

Scaffolding low quality genomes using orthologous protein sequences.

Bioinformatics. 2013 Jan 15;29(2):160-5. doi: 10.1093/bioinformatics/bts661. Epub 2012 Nov 18.

Stitching gene fragments with a network matching algorithm improves gene assembly for metagenomics.

Bioinformatics. 2012 Sep 15;28(18):i363-i369. doi: 10.1093/bioinformatics/bts388.

RNA-Seq analysis of the Sclerotinia homoeocarpa--creeping bentgrass pathosystem.

PLoS One. 2012;7(8):e41150. doi: 10.1371/journal.pone.0041150. Epub 2012 Aug 8.

MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads.

Nucleic Acids Res. 2012 Nov 1;40(20):e155. doi: 10.1093/nar/gks678. Epub 2012 Jul 19.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

一种用于下一代测序数据的可扩展且准确的靶向基因组装工具（SAT组装器）。

A scalable and accurate targeted gene assembly tool (SAT-Assembler) for next-generation sequencing data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译