Suppr超能文献

BlobToolKit - 基因组组装的交互式质量评估。

BlobToolKit - Interactive Quality Assessment of Genome Assemblies.

机构信息

Institute of Evolutionary Biology, University of Edinburgh, Edinburgh EH9 3JT, UK

Wellcome Sanger Institute, Cambridge CB10 1SA, UK.

出版信息

G3 (Bethesda). 2020 Apr 9;10(4):1361-1374. doi: 10.1534/g3.119.400908.

Abstract

Reconstruction of target genomes from sequence data produced by instruments that are agnostic as to the species-of-origin may be confounded by contaminant DNA. Whether introduced during sample processing or through co-extraction alongside the target DNA, if insufficient care is taken during the assembly process, the final assembled genome may be a mixture of data from several species. Such assemblies can confound sequence-based biological inference and, when deposited in public databases, may be included in downstream analyses by users unaware of underlying problems. We present BlobToolKit, a software suite to aid researchers in identifying and isolating non-target data in draft and publicly available genome assemblies. BlobToolKit can be used to process assembly, read and analysis files for fully reproducible interactive exploration in the browser-based Viewer. BlobToolKit can be used during assembly to filter non-target DNA, helping researchers produce assemblies with high biological credibility. We have been running an automated BlobToolKit pipeline on eukaryotic assemblies publicly available in the International Nucleotide Sequence Data Collaboration and are making the results available through a public instance of the Viewer at https://blobtoolkit.genomehubs.org/view We aim to complete analysis of all publicly available genomes and then maintain currency with the flow of new genomes. We have worked to embed these views into the presentation of genome assemblies at the European Nucleotide Archive, providing an indication of assembly quality alongside the public record with links out to allow full exploration in the Viewer.

摘要

从对起源物种一无所知的仪器产生的序列数据中重建目标基因组,可能会受到污染 DNA 的干扰。如果在组装过程中没有足够的注意,无论是在样品处理过程中引入的,还是与目标 DNA 一起提取的,最终组装的基因组可能是来自几个物种的数据的混合物。这样的组装会混淆基于序列的生物推断,并且当存储在公共数据库中时,可能会被不知道潜在问题的用户包含在下游分析中。我们介绍了 BlobToolKit,这是一个软件套件,可帮助研究人员识别和隔离草稿和公开可用基因组组装中的非目标数据。BlobToolKit 可用于处理组装、读取和分析文件,以便在基于浏览器的查看器中进行完全可重复的交互式探索。BlobToolKit 可在组装过程中用于过滤非目标 DNA,帮助研究人员生成具有高生物学可信度的组装。我们一直在国际核苷酸序列数据协作中公开的真核生物组装上运行自动 BlobToolKit 管道,并通过 https://blobtoolkit.genomehubs.org/view 上的公共查看器实例提供结果。我们的目标是完成所有公开可用基因组的分析,然后保持与新基因组的流动同步。我们致力于将这些视图嵌入到欧洲核苷酸档案库中基因组组装的呈现中,除了公共记录外,还提供组装质量的指示,并提供链接以允许在查看器中进行全面探索。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验