Moller Abraham G, Liang Chun
Department of Biology, Miami University, Oxford, OH, United States of America.
PeerJ. 2017 Sep 7;5:e3788. doi: 10.7717/peerj.3788. eCollection 2017.
Clustered regularly interspaced short palindromic repeat (CRISPR) systems are the adaptive immune systems of bacteria and archaea against viral infection. While CRISPRs have been exploited as a tool for genetic engineering, their spacer sequences can also provide valuable insights into microbial ecology by linking environmental viruses to their microbial hosts. Despite this importance, metagenomic CRISPR detection remains a major challenge. Here we present a reference-guided CRISPR spacer detection tool (genomic RISPR eference-ided earch ool-MetaCRAST) that constrains searches based on user-specified direct repeats (DRs). These DRs could be expected from assembly or taxonomic profiles of metagenomes. We compared the performance of MetaCRAST to those of two existing metagenomic CRISPR detection tools-Crass and MinCED-using both real and simulated acid mine drainage (AMD) and enhanced biological phosphorus removal (EBPR) metagenomes. Our evaluation shows MetaCRAST improves CRISPR spacer detection in real metagenomes compared to the CRISPR detection methods Crass and MinCED. Evaluation on simulated metagenomes show it performs better than tools for Illumina metagenomes and comparably for 454 metagenomes. It also has comparable performance dependence on read length and community composition, run time, and accuracy to these tools. MetaCRAST is implemented in Perl, parallelizable through the Many Core Engine (MCE), and takes metagenomic sequence reads and direct repeat queries (FASTA or FASTQ) as input. It is freely available for download at https://github.com/molleraj/MetaCRAST.
成簇规律间隔短回文重复序列(CRISPR)系统是细菌和古菌抵御病毒感染的适应性免疫系统。虽然CRISPR已被用作基因工程工具,但其间隔序列也可以通过将环境病毒与其微生物宿主联系起来,为微生物生态学提供有价值的见解。尽管具有这种重要性,但宏基因组CRISPR检测仍然是一项重大挑战。在此,我们提出了一种参考引导的CRISPR间隔检测工具(基因组CRISPR参考引导搜索工具-MetaCRAST),该工具基于用户指定的直接重复序列(DR)来限制搜索。这些DR可以从宏基因组的组装或分类学概况中获得。我们使用真实和模拟的酸性矿山排水(AMD)和强化生物除磷(EBPR)宏基因组,将MetaCRAST的性能与两种现有的宏基因组CRISPR检测工具-Crass和MinCED的性能进行了比较。我们的评估表明,与CRISPR检测方法Crass和MinCED相比,MetaCRAST在真实宏基因组中改进了CRISPR间隔检测。对模拟宏基因组的评估表明,它在Illumina宏基因组工具方面表现更好,在454宏基因组工具方面表现相当。它在对读长、群落组成、运行时间和准确性的性能依赖方面也与这些工具相当。MetaCRAST用Perl语言实现,可通过多核引擎(MCE)并行化,并将宏基因组序列读数和直接重复查询(FASTA或FASTQ)作为输入。可在https://github.com/molleraj/MetaCRAST上免费下载。