Toh Hidehiro, Shirane Kenjiro, Miura Fumihito, Kubo Naoki, Ichiyanagi Kenji, Hayashi Katsuhiko, Saitou Mitinori, Suyama Mikita, Ito Takashi, Sasaki Hiroyuki
Division of Epigenomics and Development, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Japan.
Department of Biochemistry, Kyushu University Graduate School of Medical Sciences, Fukuoka, Japan.
BMC Genomics. 2017 Jan 5;18(1):31. doi: 10.1186/s12864-016-3392-9.
Methylation of cytosine in genomic DNA is a well-characterized epigenetic modification involved in many cellular processes and diseases. Whole-genome bisulfite sequencing (WGBS), such as MethylC-seq and post-bisulfite adaptor tagging sequencing (PBAT-seq), uses the power of high-throughput DNA sequencers and provides genome-wide DNA methylation profiles at single-base resolution. However, the accuracy and consistency of WGBS outputs in relation to the operating conditions of high-throughput sequencers have not been explored.
We have used the Illumina HiSeq platform for our PBAT-based WGBS, and found that different versions of HiSeq Control Software (HCS) and Real-Time Analysis (RTA) installed on the system provided different global CpG methylation levels (approximately 5% overall difference) for the same libraries. This problem was reproduced multiple times with different WGBS libraries and likely to be associated with the low sequence diversity of bisulfite-converted DNA. We found that HCS was the major determinant in the observed differences. To determine which version of HCS is most suitable for WGBS, we used substrates with predetermined CpG methylation levels, and found that HCS v2.0.5 is the best among the examined versions. HCS v2.0.12 showed the poorest performance and provided artificially lower CpG methylation levels when 5-methylcytosine is read as guanine (first read of PBAT-seq and second read of MethylC-seq). In addition, paired-end sequencing of low diversity libraries using HCS v2.2.38 or the latest HCS v2.2.58 was greatly affected by cluster densities.
Software updates in the Illumina HiSeq platform can affect the outputs from low-diversity sequencing libraries such as WGBS libraries. More recent versions are not necessarily the better, and HCS v2.0.5 is currently the best for WGBS among the examined HCS versions. Thus, together with other experimental conditions, special care has to be taken on this point when CpG methylation levels are to be compared between different samples by WGBS.
基因组DNA中胞嘧啶的甲基化是一种特征明确的表观遗传修饰,参与许多细胞过程和疾病。全基因组亚硫酸氢盐测序(WGBS),如甲基化C测序(MethylC-seq)和亚硫酸氢盐后接头标记测序(PBAT-seq),利用高通量DNA测序仪的能力,以单碱基分辨率提供全基因组DNA甲基化图谱。然而,WGBS输出结果与高通量测序仪操作条件相关的准确性和一致性尚未得到探索。
我们使用Illumina HiSeq平台进行基于PBAT的WGBS,发现系统上安装的不同版本的HiSeq控制软件(HCS)和实时分析(RTA)为相同文库提供了不同的全局CpG甲基化水平(总体差异约5%)。这个问题在不同的WGBS文库中多次出现,并且可能与亚硫酸氢盐转化DNA的低序列多样性有关。我们发现HCS是观察到的差异的主要决定因素。为了确定哪个版本的HCS最适合WGBS,我们使用了具有预定CpG甲基化水平的底物,发现HCS v2.0.5在所检查的版本中是最好的。当5-甲基胞嘧啶被读取为鸟嘌呤时(PBAT-seq的第一次读取和MethylC-seq的第二次读取),HCS v2.0.12表现最差,并提供人为降低的CpG甲基化水平。此外,使用HCS v2.2.38或最新的HCS v2.2.58对低多样性文库进行双端测序受到簇密度的极大影响。
Illumina HiSeq平台中的软件更新会影响低多样性测序文库(如WGBS文库)的输出结果。最新版本不一定更好,在检查的HCS版本中,HCS v2.0.5目前是WGBS的最佳选择。因此,与其他实验条件一起,当通过WGBS比较不同样本之间的CpG甲基化水平时,必须特别注意这一点。