Guo Weilong, Fiziev Petko, Yan Weihong, Cokus Shawn, Sun Xueguang, Zhang Michael Q, Chen Pao-Yang, Pellegrini Matteo
Department of Molecular, Cell and Developmental Biology, University of California, Los Angeles, CA 90095, USA.
BMC Genomics. 2013 Nov 10;14:774. doi: 10.1186/1471-2164-14-774.
DNA methylation is an important epigenetic modification involved in many biological processes. Bisulfite treatment coupled with high-throughput sequencing provides an effective approach for studying genome-wide DNA methylation at base resolution. Libraries such as whole genome bisulfite sequencing (WGBS) and reduced represented bisulfite sequencing (RRBS) are widely used for generating DNA methylomes, demanding efficient and versatile tools for aligning bisulfite sequencing data.
We have developed BS-Seeker2, an updated version of BS Seeker, as a full pipeline for mapping bisulfite sequencing data and generating DNA methylomes. BS-Seeker2 improves mappability over existing aligners by using local alignment. It can also map reads from RRBS library by building special indexes with improved efficiency and accuracy. Moreover, BS-Seeker2 provides additional function for filtering out reads with incomplete bisulfite conversion, which is useful in minimizing the overestimation of DNA methylation levels. We also defined CGmap and ATCGmap file formats for full representations of DNA methylomes, as part of the outputs of BS-Seeker2 pipeline together with BAM and WIG files.
Our evaluations on the performance show that BS-Seeker2 works efficiently and accurately for both WGBS data and RRBS data. BS-Seeker2 is freely available at http://pellegrini.mcdb.ucla.edu/BS_Seeker2/ and the Galaxy server.
DNA甲基化是一种重要的表观遗传修饰,参与许多生物学过程。亚硫酸氢盐处理结合高通量测序为在碱基分辨率下研究全基因组DNA甲基化提供了一种有效方法。诸如全基因组亚硫酸氢盐测序(WGBS)和简化代表性亚硫酸氢盐测序(RRBS)等文库被广泛用于生成DNA甲基化组,这需要高效且通用的工具来比对亚硫酸氢盐测序数据。
我们开发了BS-Seeker2,它是BS Seeker的更新版本,是用于比对亚硫酸氢盐测序数据和生成DNA甲基化组的完整流程。BS-Seeker2通过使用局部比对提高了与现有比对器相比的可映射性。它还可以通过构建特殊索引以提高效率和准确性来比对来自RRBS文库的 reads。此外,BS-Seeker2提供了额外功能,用于过滤掉亚硫酸氢盐转化不完全的 reads,这对于最小化DNA甲基化水平的高估很有用。我们还定义了CGmap和ATCGmap文件格式,用于完整表示DNA甲基化组,作为BS-Seeker2流程的输出的一部分,同时还有BAM和WIG文件。
我们对性能的评估表明,BS-Seeker2对于WGBS数据和RRBS数据都能高效且准确地工作。BS-Seeker2可在http://pellegrini.mcdb.ucla.edu/BS_Seeker2/和Galaxy服务器上免费获取。