Department of Biological Sciences, University of Arkansas, Fayetteville, AR, USA.
Bioinformatics. 2018 Dec 15;34(24):4293-4296. doi: 10.1093/bioinformatics/bty548.
It is a non-trivial task to identify and design capture probes ('baits') for the diverse array of targeted-enrichment methods now available (e.g. ultra-conserved elements, anchored hybrid enrichment, RAD-capture). This often involves parsing large genomic alignments, followed by multiple steps of curating candidate genomic regions to optimize targeted information content (e.g. genetic variation) and to minimize potential probe dimerization and non-target enrichment.
In this context, we developed MrBait, a user-friendly, generalized software pipeline for identification, design and optimization of targeted-enrichment probes across a range of target-capture paradigms. MrBait is an open-source codebase that leverages native parallelization capabilities in Python and mitigates memory usage via a relational-database back-end. Numerous filtering methods allow comprehensive optimization of designed probes, including built-in functionality that employs BLAST, similarity-based clustering and a graph-based algorithm that 'rescues' failed probes.
Complete code for MrBait is available on GitHub (https://github.com/tkchafin/mrbait), and is also available with all dependencies via one-line installation using the conda package manager. Online documentation describing installation and runtime instructions can be found at: https://mrbait.readthedocs.io.
Supplementary data are available at Bioinformatics online.
识别和设计针对各种靶向富集方法(例如超保守元件、锚定杂交富集、RAD 捕获)的捕获探针(“诱饵”)是一项艰巨的任务。这通常涉及解析大型基因组比对,然后经过多个步骤来编辑候选基因组区域,以优化靶向信息含量(例如遗传变异)并最小化潜在的探针二聚化和非靶向富集。
在这种情况下,我们开发了 MrBait,这是一种用户友好的、通用的软件管道,可用于识别、设计和优化各种靶向捕获范式的靶向富集探针。MrBait 是一个开源代码库,利用 Python 中的本机并行化功能,并通过关系数据库后端来减轻内存使用。许多过滤方法允许全面优化设计的探针,包括内置功能,该功能使用 BLAST、基于相似性的聚类以及基于图的算法来“挽救”失败的探针。
MrBait 的完整代码可在 GitHub(https://github.com/tkchafin/mrbait)上获得,也可以通过使用 conda 包管理器的一行安装获得所有依赖项。在线文档描述了安装和运行时说明,可以在:https://mrbait.readthedocs.io 上找到。
补充数据可在 Bioinformatics 在线获得。