Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza Houston, TX 77030, USA.
BMC Bioinformatics. 2010 Nov 23;11:572. doi: 10.1186/1471-2105-11-572.
Massively parallel sequencing readouts of epigenomic assays are enabling integrative genome-wide analyses of genomic and epigenomic variation. Pash 3.0 performs sequence comparison and read mapping and can be employed as a module within diverse configurable analysis pipelines, including ChIP-Seq and methylome mapping by whole-genome bisulfite sequencing.
Pash 3.0 generally matches the accuracy and speed of niche programs for fast mapping of short reads, and exceeds their performance on longer reads generated by a new generation of massively parallel sequencing technologies. By exploiting longer read lengths, Pash 3.0 maps reads onto the large fraction of genomic DNA that contains repetitive elements and polymorphic sites, including indel polymorphisms.
We demonstrate the versatility of Pash 3.0 by analyzing the interaction between CpG methylation, CpG SNPs, and imprinting based on publicly available whole-genome shotgun bisulfite sequencing data. Pash 3.0 makes use of gapped k-mer alignment, a non-seed based comparison method, which is implemented using multi-positional hash tables. This allows Pash 3.0 to run on diverse hardware platforms, including individual computers with standard RAM capacity, multi-core hardware architectures and large clusters.
大规模平行测序读码的表观基因组检测正在实现基因组和表观基因组变异的全基因组综合分析。Pash 3.0 执行序列比较和读码映射,并且可以用作不同可配置分析管道的模块,包括 ChIP-Seq 和通过全基因组亚硫酸氢盐测序进行甲基化图谱绘制。
Pash 3.0 通常可以与用于快速映射短读码的专业程序的准确性和速度相匹配,并且在新一代大规模平行测序技术生成的较长读码方面的性能超过它们。通过利用较长的读码长度,Pash 3.0 可以将读码映射到包含重复元件和多态性位点的基因组 DNA 的很大一部分,包括插入缺失多态性。
我们通过分析基于公开的全基因组鸟枪法亚硫酸氢盐测序数据的 CpG 甲基化、CpG SNPs 和印迹之间的相互作用,展示了 Pash 3.0 的多功能性。Pash 3.0 利用缺口 k-mer 比对,这是一种非基于种子的比较方法,它使用多位置哈希表实现。这允许 Pash 3.0 在各种硬件平台上运行,包括具有标准 RAM 容量的个人计算机、多核硬件架构和大型集群。