Suppr超能文献

HoCoRT:宿主污染去除工具。

HoCoRT: host contamination removal tool.

机构信息

Centre for Bioinformatics, Department of Informatics, University of Oslo, PO Box 1080 Blindern, 0316, Oslo, Norway.

Centre for Bioinformatics, Department of Pharmacy, University of Oslo, PO Box 1068 Blindern, 0316, Oslo, Norway.

出版信息

BMC Bioinformatics. 2023 Oct 2;24(1):371. doi: 10.1186/s12859-023-05492-w.

Abstract

BACKGROUND

Shotgun metagenome sequencing data obtained from a host environment will usually be contaminated with sequences from the host organism. Host sequences should be removed before further analysis to avoid biases, reduce downstream computational load, or ensure privacy in the case of a human host. The tools that we identified, as designed specifically to perform host contamination sequence removal, were either outdated, not maintained, or complicated to use. Consequently, we have developed HoCoRT, a fast and user-friendly tool that implements several methods for optimised host sequence removal. We have evaluated the speed and accuracy of these methods.

RESULTS

HoCoRT is an open-source command-line tool for host contamination removal. It is designed to be easy to install and use, offering a one-step option for genome indexing. HoCoRT employs a variety of well-known mapping, classification, and alignment methods to classify reads. The user can select the underlying classification method and its parameters, allowing adaptation to different scenarios. Based on our investigation of various methods and parameters using synthetic human gut and oral microbiomes, and on assessment of publicly available data, we provide recommendations for typical datasets with short and long reads.

CONCLUSIONS

To decontaminate a human gut microbiome with short reads using HoCoRT, we found the optimal combination of speed and accuracy with BioBloom, Bowtie2 in end-to-end mode, and HISAT2. Kraken2 consistently demonstrated the highest speed, albeit with a trade-off in accuracy. The same applies to an oral microbiome, but here Bowtie2 was notably slower than the other tools. For long reads, the detection of human host reads is more difficult. In this case, a combination of Kraken2 and Minimap2 achieved the highest accuracy and detected 59% of human reads. In comparison to the dedicated DeconSeq tool, HoCoRT using Bowtie2 in end-to-end mode proved considerably faster and slightly more accurate. HoCoRT is available as a Bioconda package, and the source code can be accessed at https://github.com/ignasrum/hocort along with the documentation. It is released under the MIT licence and is compatible with Linux and macOS (except for the BioBloom module).

摘要

背景

从宿主环境中获得的 shotgun 宏基因组测序数据通常会受到宿主生物序列的污染。在进一步分析之前,应该去除宿主序列,以避免偏差、减少下游计算负担,或在宿主为人的情况下确保隐私。我们确定的专门用于去除宿主污染序列的工具要么已经过时,要么不再维护,要么使用起来很复杂。因此,我们开发了 HoCoRT,这是一种快速且用户友好的工具,它实现了几种优化宿主序列去除的方法。我们已经评估了这些方法的速度和准确性。

结果

HoCoRT 是一种用于去除宿主污染的开源命令行工具。它设计简单易用,提供了基因组索引的一键式选项。HoCoRT 采用了多种知名的映射、分类和对齐方法来对reads 进行分类。用户可以选择底层的分类方法及其参数,以适应不同的场景。基于我们对各种方法和参数的研究,使用合成的人类肠道和口腔微生物组,并对公开可用的数据进行评估,我们为具有短读长和长读长的典型数据集提供了建议。

结论

要使用 HoCoRT 去除短读长的人类肠道微生物组的污染,我们发现使用 BioBloom、Bowtie2 端到端模式和 HISAT2 组合可以达到最佳的速度和准确性。Kraken2 始终表现出最高的速度,但准确性略有下降。这同样适用于口腔微生物组,但这里 Bowtie2 明显比其他工具慢。对于长读长,检测人类宿主读长更加困难。在这种情况下,Kraken2 和 Minimap2 的组合达到了最高的准确性,并检测到了 59%的人类读长。与专用的 DeconSeq 工具相比,使用 Bowtie2 端到端模式的 HoCoRT 速度快得多,准确性略高。HoCoRT 作为 Bioconda 软件包提供,可以在 https://github.com/ignasrum/hocort 上访问源代码和文档。它是在 MIT 许可证下发布的,与 Linux 和 macOS 兼容(除了 BioBloom 模块)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/92a8/10544359/8baa0b810135/12859_2023_5492_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验