块状体学：使用分类群注释的 GC 覆盖图探索原始基因组数据中的污染物、共生体和寄生虫。

Blobology: exploring raw genome data for contaminants, symbionts and parasites using taxon-annotated GC-coverage plots.

机构信息

Institute of Evolutionary Biology, Ashworth Laboratories, University of Edinburgh Edinburgh, UK.

Institute of Evolutionary Biology, Ashworth Laboratories, University of Edinburgh Edinburgh, UK ; Edinburgh Genomics, University of Edinburgh Edinburgh, UK.

出版信息

Front Genet. 2013 Nov 29;4:237. doi: 10.3389/fgene.2013.00237. eCollection 2013.

DOI:10.3389/fgene.2013.00237

PMID:24348509

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3843372/

Abstract

Generating the raw data for a de novo genome assembly project for a target eukaryotic species is relatively easy. This democratization of access to large-scale data has allowed many research teams to plan to assemble the genomes of non-model organisms. These new genome targets are very different from the traditional, inbred, laboratory-reared model organisms. They are often small, and cannot be isolated free of their environment - whether ingested food, the surrounding host organism of parasites, or commensal and symbiotic organisms attached to or within the individuals sampled. Preparation of pure DNA originating from a single species can be technically impossible, but assembly of mixed-organism DNA can be difficult, as most genome assemblers perform poorly when faced with multiple genomes in different stoichiometries. This class of problem is common in metagenomic datasets that deliberately try to capture all the genomes present in an environment, but replicon assembly is not often the goal of such programs. Here we present an approach to extracting, from mixed DNA sequence data, subsets that correspond to single species' genomes and thus improving genome assembly. We use both numerical (proportion of GC bases and read coverage) and biological (best-matching sequence in annotated databases) indicators to aid partitioning of draft assembly contigs, and the reads that contribute to those contigs, into distinct bins that can then be subjected to rigorous, optimized assembly, through the use of taxon-annotated GC-coverage plots (TAGC plots). We also present Blobsplorer, a tool that aids exploration and selection of subsets from TAGC-annotated data. Partitioning the data in this way can rescue poorly assembled genomes, and reveal unexpected symbionts and commensals in eukaryotic genome projects. The TAGC plot pipeline script is available from https://github.com/blaxterlab/blobology, and the Blobsplorer tool from https://github.com/mojones/Blobsplorer.

摘要

为目标真核生物从头组装基因组项目生成原始数据相对容易。这种大规模数据获取的民主化使许多研究团队能够计划组装非模式生物的基因组。这些新的基因组靶标与传统的、近交的、实验室饲养的模式生物非常不同。它们通常很小，并且不能在没有环境的情况下分离——无论是摄入的食物、寄生虫的周围宿主生物体，还是附着在或存在于所采样个体内部或内部的共生和共生生物体。从单一物种中提取纯 DNA 在技术上可能是不可能的，但混合生物体 DNA 的组装可能很困难，因为大多数基因组组装器在面对不同化学计量的多个基因组时表现不佳。在故意试图捕获环境中存在的所有基因组的宏基因组数据集中，此类问题很常见，但此类程序通常不是复制子组装的目标。在这里，我们提出了一种从混合 DNA 序列数据中提取对应于单个物种基因组的子集的方法，从而改善基因组组装。我们使用数字（GC 碱基和读取覆盖率的比例）和生物（注释数据库中最佳匹配序列）指标来辅助划分草稿组装 contigs 和为这些 contigs 做出贡献的读取，然后将这些 contigs 分为不同的 bin，然后通过使用分类群注释的 GC-coverage 图（TAGC 图）对其进行严格、优化的组装。我们还介绍了 Blobsplorer，这是一种辅助从 TAGC 注释数据中探索和选择子集的工具。以这种方式对数据进行分区可以挽救组装不良的基因组，并揭示真核生物基因组项目中意想不到的共生体和共生体。TAGC 图管道脚本可从 https://github.com/blaxterlab/blobology 获得，而 Blobsplorer 工具可从 https://github.com/mojones/Blobsplorer 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ba6/3843372/a701894cc005/fgene-04-00237-g001.jpg

相似文献

Blobology: exploring raw genome data for contaminants, symbionts and parasites using taxon-annotated GC-coverage plots.

Front Genet. 2013 Nov 29;4:237. doi: 10.3389/fgene.2013.00237. eCollection 2013.

Fragmentation and Coverage Variation in Viral Metagenome Assemblies, and Their Effect in Diversity Calculations.

Front Bioeng Biotechnol. 2015 Sep 17;3:141. doi: 10.3389/fbioe.2015.00141. eCollection 2015.

MarkerScan: Separation and assembly of cobionts sequenced alongside target species in biodiversity genomics projects.

Wellcome Open Res. 2024 Feb 13;9:33. doi: 10.12688/wellcomeopenres.20730.1. eCollection 2024.

Assessment of metagenomic assemblers based on hybrid reads of real and simulated metagenomic sequences.

Brief Bioinform. 2020 May 21;21(3):777-790. doi: 10.1093/bib/bbz025.

Optimizing and evaluating the reconstruction of Metagenome-assembled microbial genomes.

BMC Genomics. 2017 Nov 28;18(1):915. doi: 10.1186/s12864-017-4294-1.

Evaluation of short read metagenomic assembly.

BMC Genomics. 2011;12 Suppl 2(Suppl 2):S8. doi: 10.1186/1471-2164-12-S2-S8. Epub 2011 Jul 27.

Binnacle: Using Scaffolds to Improve the Contiguity and Quality of Metagenomic Bins.

Front Microbiol. 2021 Feb 24;12:638561. doi: 10.3389/fmicb.2021.638561. eCollection 2021.

Organelle_PBA, a pipeline for assembling chloroplast and mitochondrial genomes from PacBio DNA sequencing data.

BMC Genomics. 2017 Jan 7;18(1):49. doi: 10.1186/s12864-016-3412-9.

NeatFreq: reference-free data reduction and coverage normalization for De Novo sequence assembly.

BMC Bioinformatics. 2014 Nov 19;15(1):357. doi: 10.1186/s12859-014-0357-3.

V-GAP: Viral genome assembly pipeline.

Gene. 2016 Feb 1;576(2 Pt 1):676-80. doi: 10.1016/j.gene.2015.10.029. Epub 2015 Oct 22.

引用本文的文献

Evolutionary Convergence of Nutritional Symbionts in Ticks.

Environ Microbiol Rep. 2025 Jun;17(3):e70120. doi: 10.1111/1758-2229.70120.

Small but Mitey: A Gapless Telomere-to-Telomere Assembly of an Unidentified Mite With a Streamlined Genome.

Genome Biol Evol. 2025 Feb 3;17(2). doi: 10.1093/gbe/evaf023.

A chromosome-level genome assembly of the cabbage aphid Brevicoryne brassicae.

Sci Data. 2025 Jan 28;12(1):167. doi: 10.1038/s41597-025-04501-2.

Hepatincolaceae (Alphaproteobacteria) are Distinct From Holosporales and Independently Evolved to Associate With Ecdysozoa.

Environ Microbiol. 2025 Jan;27(1):e70028. doi: 10.1111/1462-2920.70028.

Genomes of two invasive species (hemlock woolly adelgid and pineapple gall adelgid) enable characterization of nicotinic acetylcholine receptors.

bioRxiv. 2024 Nov 26:2024.11.21.624573. doi: 10.1101/2024.11.21.624573.

Chromosome-level genome assemblies and genetic maps reveal heterochiasmy and macrosynteny in endangered Atlantic Acropora.

BMC Genomics. 2024 Nov 20;25(1):1119. doi: 10.1186/s12864-024-11025-3.

Horizontal gene transfer and symbiotic microorganisms regulate the adaptive evolution of intertidal algae, Porphyra sense lato.

Commun Biol. 2024 Aug 11;7(1):976. doi: 10.1038/s42003-024-06663-y.

Solving genomic puzzles: computational methods for metagenomic binning.

Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae372.

Microbiome Taxonomic and Functional Differences in C3H/HeJ Mice Fed a Long-Term High-Fat Diet with Beef Protein ± Ammonium Hydroxide Supplementation.

Nutrients. 2024 May 25;16(11):1613. doi: 10.3390/nu16111613.

IMA Genome - F19 : A genome assembly and annotation guide to empower mycologists, including annotated draft genome sequences of Ceratocystis pirilliformis, Diaporthe australafricana, Fusarium ophioides, Paecilomyces lecythidis, and Sporothrix stenoceras.

IMA Fungus. 2024 Jun 3;15(1):12. doi: 10.1186/s43008-024-00142-z.

本文引用的文献

Microbial community analysis using MEGAN.

Methods Enzymol. 2013;531:465-85. doi: 10.1016/B978-0-12-407863-5.00021-6.

Toward 959 nematode genomes.

Worm. 2012 Jan 1;1(1):42-50. doi: 10.4161/worm.19046.

The genome and developmental transcriptome of the strongylid nematode Haemonchus contortus.

Genome Biol. 2013 Aug 28;14(8):R89. doi: 10.1186/gb-2013-14-8-r89.

Phylogenomics and analysis of shared genes suggest a single transition to mutualism in Wolbachia of nematodes.

Genome Biol Evol. 2013;5(9):1668-74. doi: 10.1093/gbe/evt125.

Proteogenomic analysis of a thermophilic bacterial consortium adapted to deconstruct switchgrass.

PLoS One. 2013 Jul 19;8(7):e68465. doi: 10.1371/journal.pone.0068465. Print 2013.

The transcriptome of the invasive eel swimbladder nematode parasite Anguillicola crassus.

BMC Genomics. 2013 Feb 8;14:87. doi: 10.1186/1471-2164-14-87.

The binning of metagenomic contigs for microbial physiology of mixed cultures.

Front Microbiol. 2012 Dec 5;3:410. doi: 10.3389/fmicb.2012.00410. eCollection 2012.

Silencing of germline-expressed genes by DNA elimination in somatic cells.

Dev Cell. 2012 Nov 13;23(5):1072-80. doi: 10.1016/j.devcel.2012.09.020. Epub 2012 Nov 1.

The genome of the heartworm, Dirofilaria immitis, reveals drug and vaccine targets.

FASEB J. 2012 Nov;26(11):4650-61. doi: 10.1096/fj.12-205096. Epub 2012 Aug 13.

Simultaneous genome sequencing of symbionts and their hosts.

Symbiosis. 2011 Nov;55(3):119-126. doi: 10.1007/s13199-012-0154-6. Epub 2012 Feb 15.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

块状体学：使用分类群注释的 GC 覆盖图探索原始基因组数据中的污染物、共生体和寄生虫。

Blobology: exploring raw genome data for contaminants, symbionts and parasites using taxon-annotated GC-coverage plots.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献