NDM Experimental Medicine, University of Oxford, John Radcliffe Hospital, Oxfordshire OX3 9DU, United Kingdom.
The National Institute for Health Research Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance, University of Oxford, John Radcliffe Hospital, Oxfordshire OX3 9DU, United Kingdom.
Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad728.
Microbial sequences generated from clinical samples are often contaminated with human host sequences that must be removed for ethical and legal reasons. Care must be taken to excise host sequences without inadvertently removing target microbial sequences to the detriment of downstream analyses such as variant calling and de novo assembly.
To facilitate accurate host decontamination of both short and long sequencing reads, we developed Hostile, a tool capable of accurate host read removal using a laptop. We demonstrate that our approach removes at least 99.6% of real human reads and retains at least 99.989% of simulated bacterial reads. Using Hostile with a masked reference genome further increases bacterial read retention (≥99.997%) with negligible (≤0.001%) reduction in human read removal performance. Compared with an existing tool, Hostile removes 21%-23% more human short reads and 21-43 times fewer bacterial reads, typically in less time.
Hostile is implemented as an MIT-licensed Python package available from https://github.com/bede/hostile together with supplementary material.
出于伦理和法律原因,必须从临床样本中生成的微生物序列中去除人类宿主序列。必须小心地切除宿主序列,而不会无意中去除下游分析(如变异调用和从头组装)所需的目标微生物序列。
为了方便对短读长和长读长进行准确的宿主去污染,我们开发了 Hostile,这是一种能够使用笔记本电脑进行准确宿主读去除的工具。我们证明,我们的方法可以去除至少 99.6%的真实人类读长,并保留至少 99.989%的模拟细菌读长。使用带有屏蔽参考基因组的 Hostile 进一步提高了细菌读长的保留率(≥99.997%),而对人类读长去除性能的影响可以忽略不计(≤0.001%)。与现有工具相比,Hostile 去除了 21%-23%的更多人类短读长,而细菌读长减少了 21-43 倍,通常时间更短。
Hostile 作为一个 MIT 许可的 Python 包实现,可从 https://github.com/bede/hostile 获得,以及配套材料。