Department of Biology and Molecular Biology, Montclair State University, Montclair, NJ 07043, USA.
BMC Genomics. 2013 Oct 4;14:679. doi: 10.1186/1471-2164-14-679.
The advent of next-generation high-throughput technologies has revolutionized whole genome sequencing, yet some experiments require sequencing only of targeted regions of the genome from a very large number of samples. These regions can be amplified by PCR and sequenced by next-generation methods using a multidimensional pooling strategy. However, there is at present no available generalized tool for the computational analysis of target-enriched NGS data from multidimensional pools.
Here we present InsertionMapper, a pipeline tool for the identification of targeted sequences from multidimensional high throughput sequencing data. InsertionMapper consists of four independently working modules: Data Preprocessing, Database Modeling, Dimension Deconvolution and Element Mapping. We illustrate InsertionMapper with an example from our project 'New reverse genetics resources for maize', which aims to sequence-index a collection of 15,000 independent insertion sites of the transposon Ds in maize. Identified sequences are validated by PCR assays. This pipeline tool is applicable to similar scenarios requiring analysis of the tremendous output of short reads produced in NGS sequencing experiments of targeted genome sequences.
InsertionMapper is proven efficacious for the identification of target-enriched sequences from multidimensional high throughput sequencing data. With adjustable parameters and experiment configurations, this tool can save great computational effort to biologists interested in identifying their sequences of interest within the huge output of modern DNA sequencers. InsertionMapper is freely accessible at https://sourceforge.net/p/insertionmapper and http://bo.csam.montclair.edu/du/insertionmapper.
下一代高通量技术的出现彻底改变了全基因组测序,但有些实验仅需要对大量样本的基因组的特定区域进行测序。这些区域可以通过 PCR 扩增,并使用多维池化策略通过下一代方法进行测序。然而,目前还没有可用于计算多维池化靶向 NGS 数据的通用工具。
我们在此提出了 InsertionMapper,这是一种用于从多维高通量测序数据中识别靶向序列的流水线工具。InsertionMapper 由四个独立工作的模块组成:数据预处理、数据库建模、维度去卷积和元素映射。我们用来自我们的项目“玉米的新反向遗传学资源”的一个示例来说明 InsertionMapper,该项目旨在对玉米中转座子 Ds 的 15000 个独立插入位点进行测序索引。鉴定出的序列通过 PCR 检测进行验证。该流水线工具适用于需要分析靶向基因组序列的 NGS 测序实验中产生的大量短读的类似情况。
InsertionMapper 被证明可以有效地从多维高通量测序数据中识别靶向富集序列。通过可调参数和实验配置,该工具可以为有兴趣在现代 DNA 测序仪的大量输出中识别其感兴趣序列的生物学家节省大量的计算工作量。InsertionMapper 可在 https://sourceforge.net/p/insertionmapper 和 http://bo.csam.montclair.edu/du/insertionmapper 上免费获取。