Department of Clinical and Molecular Medicine, NTNU - Norwegian University of Science and Technology, P.O. Box 8905, NO-7491, Trondheim, Norway.
Clinic of Surgery, St. Olavs Hospital, Trondheim University Hospital, NO-7030, Trondheim, Norway.
Clin Epigenetics. 2019 Dec 12;11(1):193. doi: 10.1186/s13148-019-0795-x.
Sequencing technologies have changed not only our approaches to classical genetics, but also the field of epigenetics. Specific methods allow scientists to identify novel genome-wide epigenetic patterns of DNA methylation down to single-nucleotide resolution. DNA methylation is the most researched epigenetic mark involved in various processes in the human cell, including gene regulation and development of diseases, such as cancer. Increasing numbers of DNA methylation sequencing datasets from human genome are produced using various platforms-from methylated DNA precipitation to the whole genome bisulfite sequencing. Many of those datasets are fully accessible for repeated analyses. Sequencing experiments have become routine in laboratories around the world, while analysis of outcoming data is still a challenge among the majority of scientists, since in many cases it requires advanced computational skills. Even though various tools are being created and published, guidelines for their selection are often not clear, especially to non-bioinformaticians with limited experience in computational analyses. Separate tools are often used for individual steps in the analysis, and these can be challenging to manage and integrate. However, in some instances, tools are combined into pipelines that are capable to complete all the essential steps to achieve the result. In the case of DNA methylation sequencing analysis, the goal of such pipeline is to map sequencing reads, calculate methylation levels, and distinguish differentially methylated positions and/or regions. The objective of this review is to describe basic principles and steps in the analysis of DNA methylation sequencing data that in particular have been used for mammalian genomes, and more importantly to present and discuss the most pronounced computational pipelines that can be used to analyze such data. We aim to provide a good starting point for scientists with limited experience in computational analyses of DNA methylation and hydroxymethylation data, and recommend a few tools that are powerful, but still easy enough to use for their own data analysis.
测序技术不仅改变了我们对经典遗传学的方法,也改变了表观遗传学领域。特定的方法使科学家能够以单核苷酸分辨率识别新的全基因组 DNA 甲基化表观遗传模式。DNA 甲基化是涉及人类细胞中各种过程的研究最多的表观遗传标记,包括基因调控和疾病的发展,如癌症。使用各种平台——从甲基化 DNA 沉淀到全基因组亚硫酸氢盐测序——从人类基因组中产生了越来越多的 DNA 甲基化测序数据集。这些数据集中有许多可以进行重复分析。测序实验已经成为世界各地实验室的常规操作,而对输出数据的分析仍然是大多数科学家面临的挑战,因为在许多情况下,它需要先进的计算技能。尽管正在创建和发布各种工具,但对于没有生物信息学背景且在计算分析方面经验有限的人来说,选择这些工具的指南通常并不明确。通常为分析中的各个步骤使用单独的工具,这些工具可能难以管理和集成。然而,在某些情况下,工具被组合成能够完成实现结果所需的所有基本步骤的管道。在 DNA 甲基化测序分析的情况下,这种管道的目标是映射测序读取,计算甲基化水平,并区分差异甲基化的位置和/或区域。本综述的目的是描述 DNA 甲基化测序数据分析的基本原理和步骤,特别是在哺乳动物基因组中使用的原理和步骤,更重要的是介绍和讨论可用于分析此类数据的最显著的计算管道。我们旨在为计算分析 DNA 甲基化和羟甲基化数据经验有限的科学家提供一个良好的起点,并推荐一些功能强大但仍易于使用的工具,供他们自己进行数据分析。