School of Information Science and Engineering, Central South University, Changsha 410083, China.
GrandOmics Biosciences, Beijing 102206, China.
Bioinformatics. 2019 Nov 1;35(22):4586-4595. doi: 10.1093/bioinformatics/btz276.
The Oxford Nanopore sequencing enables to directly detect methylation states of bases in DNA from reads without extra laboratory techniques. Novel computational methods are required to improve the accuracy and robustness of DNA methylation state prediction using Nanopore reads.
In this study, we develop DeepSignal, a deep learning method to detect DNA methylation states from Nanopore sequencing reads. Testing on Nanopore reads of Homo sapiens (H. sapiens), Escherichia coli (E. coli) and pUC19 shows that DeepSignal can achieve higher performance at both read level and genome level on detecting 6 mA and 5mC methylation states comparing to previous hidden Markov model (HMM) based methods. DeepSignal achieves similar performance cross different DNA methylation bases, different DNA methylation motifs and both singleton and mixed DNA CpG. Moreover, DeepSignal requires much lower coverage than those required by HMM and statistics based methods. DeepSignal can achieve 90% above accuracy for detecting 5mC and 6 mA using only 2× coverage of reads. Furthermore, for DNA CpG methylation state prediction, DeepSignal achieves 90% correlation with bisulfite sequencing using just 20× coverage of reads, which is much better than HMM based methods. Especially, DeepSignal can predict methylation states of 5% more DNA CpGs that previously cannot be predicted by bisulfite sequencing. DeepSignal can be a robust and accurate method for detecting methylation states of DNA bases.
DeepSignal is publicly available at https://github.com/bioinfomaticsCSU/deepsignal.
Supplementary data are available at bioinformatics online.
牛津纳米孔测序能够直接从读取序列中检测 DNA 中的碱基甲基化状态,而无需额外的实验室技术。需要新的计算方法来提高使用纳米孔读取序列预测 DNA 甲基化状态的准确性和稳健性。
在这项研究中,我们开发了 DeepSignal,这是一种用于从纳米孔测序读取序列中检测 DNA 甲基化状态的深度学习方法。在人类(Homo sapiens)、大肠杆菌(Escherichia coli)和 pUC19 的纳米孔读取序列上的测试表明,与基于隐马尔可夫模型(HMM)的先前方法相比,DeepSignal 在读取水平和基因组水平上都能更准确地检测 6mA 和 5mC 甲基化状态。DeepSignal 在不同的 DNA 甲基化碱基、不同的 DNA 甲基化基序以及单核苷酸和混合 DNA CpG 上都能达到类似的性能。此外,DeepSignal 所需的覆盖度比 HMM 和基于统计的方法要求的低得多。仅使用 2×读取覆盖度,DeepSignal 就能达到 90%以上的 5mC 和 6mA 检测准确率。此外,对于 DNA CpG 甲基化状态预测,DeepSignal 仅使用 20×的读取覆盖度,就能与 bisulfite 测序达到 90%的相关性,这比基于 HMM 的方法要好得多。特别是,DeepSignal 能够预测到比 bisulfite 测序之前更多的 5%的 DNA CpG 甲基化状态,这些状态之前无法预测。DeepSignal 可以成为一种可靠且准确的检测 DNA 碱基甲基化状态的方法。
DeepSignal 可在 https://github.com/bioinfomaticsCSU/deepsignal 上公开获取。
补充数据可在 bioinformatics 在线获取。