Baheti Saurabh, Kanwar Rahul, Goelzenleuchter Meike, Kocher Jean-Pierre A, Beutler Andreas S, Sun Zhifu
Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN, 55905, USA.
Department of Medical Oncology, Mayo Clinic, Rochester, MN, 55905, USA.
BMC Genomics. 2016 Feb 27;17:149. doi: 10.1186/s12864-016-2494-8.
DNA methylation is an important epigenetic modification involved in many biological processes. Reduced representation bisulfite sequencing (RRBS) is a cost-effective method for studying DNA methylation at single base resolution. Although several tools are available for RRBS data processing and analysis, it is not clear which strategy performs the best and there has not been much attention to the contamination issue from artificial cytosines incorporated during the end repair step of library preparation. To address these issues, we describe a new method, Targeted Alignment and Artificial Cytosine Elimination for RRBS (TRACE-RRBS), which aligns bisulfite sequence reads to MSP1 digitally digested reference and specifically removes the end repair cytosines. We compared this approach on a simulated and a real dataset with 7 other RRBS analysis tools and Illumina 450 K microarray platform.
TRACE-RRBS aligns sequence reads to a small fraction of the genome where RRBS protocol targets on and was demonstrated as the fastest, most sensitive and specific tool for the simulated dataset. For the real dataset, TRACE-RRBS took about the same time as RRBSMAP, a third to a sixth of time needed for BISMARK and NOVOALIGN. TRACE-RRBS aligned more reads uniquely than other tools and achieved the highest correlation with 450 k microarray data. The end repair artificial cytosine removal increased correlation between nearby CpGs and accuracy of methylation quantification.
TRACE-RRBS is fast and more accurate tool for RRBS data analysis. It is freely available for academic use at http://bioinformaticstools.mayo.edu/.
DNA甲基化是一种重要的表观遗传修饰,参与许多生物学过程。简化代表性亚硫酸氢盐测序(RRBS)是一种用于在单碱基分辨率下研究DNA甲基化的经济有效的方法。尽管有几种工具可用于RRBS数据处理和分析,但尚不清楚哪种策略效果最佳,并且对于文库制备末端修复步骤中掺入的人工胞嘧啶的污染问题也没有给予太多关注。为了解决这些问题,我们描述了一种新方法,即RRBS的靶向比对和人工胞嘧啶消除(TRACE-RRBS),它将亚硫酸氢盐序列读数与经MSP1数字消化的参考序列进行比对,并特异性去除末端修复胞嘧啶。我们将这种方法在模拟数据集和真实数据集上与其他7种RRBS分析工具以及Illumina 450K微阵列平台进行了比较。
TRACE-RRBS将序列读数比对到RRBS方案所靶向的一小部分基因组上,并被证明是模拟数据集最快、最灵敏和最特异的工具。对于真实数据集,TRACE-RRBS花费的时间与RRBSMAP大致相同,是BISMARK和NOVOALIGN所需时间的三分之一到六分之一。TRACE-RRBS比其他工具唯一比对的读数更多,并且与450k微阵列数据的相关性最高。末端修复人工胞嘧啶的去除增加了附近CpG之间的相关性以及甲基化定量的准确性。
TRACE-RRBS是一种用于RRBS数据分析的快速且更准确的工具。它可在http://bioinformaticstools.mayo.edu/免费用于学术用途。