基于信息论的全基因组亚硫酸氢盐测序数据分析建模方法。

An information-theoretic approach to the modeling and analysis of whole-genome bisulfite sequencing data.

机构信息

Whitaker Biomedical Engineering Institute, Johns Hopkins University, Baltimore, MD, USA.

Center for Epigenetics, Johns Hopkins School of Medicine, Baltimore, MD, USA.

出版信息

BMC Bioinformatics. 2018 Mar 7;19(1):87. doi: 10.1186/s12859-018-2086-5.

DOI:10.1186/s12859-018-2086-5

PMID:29514626

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5842653/

Abstract

BACKGROUND

DNA methylation is a stable form of epigenetic memory used by cells to control gene expression. Whole genome bisulfite sequencing (WGBS) has emerged as a gold-standard experimental technique for studying DNA methylation by producing high resolution genome-wide methylation profiles. Statistical modeling and analysis is employed to computationally extract and quantify information from these profiles in an effort to identify regions of the genome that demonstrate crucial or aberrant epigenetic behavior. However, the performance of most currently available methods for methylation analysis is hampered by their inability to directly account for statistical dependencies between neighboring methylation sites, thus ignoring significant information available in WGBS reads.

RESULTS

We present a powerful information-theoretic approach for genome-wide modeling and analysis of WGBS data based on the 1D Ising model of statistical physics. This approach takes into account correlations in methylation by utilizing a joint probability model that encapsulates all information available in WGBS methylation reads and produces accurate results even when applied on single WGBS samples with low coverage. Using the Shannon entropy, our approach provides a rigorous quantification of methylation stochasticity in individual WGBS samples genome-wide. Furthermore, it utilizes the Jensen-Shannon distance to evaluate differences in methylation distributions between a test and a reference sample. Differential performance assessment using simulated and real human lung normal/cancer data demonstrate a clear superiority of our approach over DSS, a recently proposed method for WGBS data analysis. Critically, these results demonstrate that marginal methods become statistically invalid when correlations are present in the data.

CONCLUSIONS

This contribution demonstrates clear benefits and the necessity of modeling joint probability distributions of methylation using the 1D Ising model of statistical physics and of quantifying methylation stochasticity using concepts from information theory. By employing this methodology, substantial improvement of DNA methylation analysis can be achieved by effectively taking into account the massive amount of statistical information available in WGBS data, which is largely ignored by existing methods.

摘要

背景

DNA 甲基化是细胞用于控制基因表达的一种稳定的表观遗传记忆形式。全基因组亚硫酸氢盐测序（WGBS）已成为研究 DNA 甲基化的金标准实验技术，它可以生成高分辨率的全基因组甲基化图谱。统计建模和分析被用于从这些图谱中计算提取和量化信息，以识别表现出关键或异常表观遗传行为的基因组区域。然而，大多数现有的甲基化分析方法由于无法直接考虑相邻甲基化位点之间的统计依赖性，从而忽略了 WGBS 读取中提供的重要信息，因此其性能受到阻碍。

结果

我们提出了一种基于统计物理学 1D Ising 模型的强大的全基因组 WGBS 数据建模和分析方法。该方法通过利用联合概率模型来考虑甲基化的相关性，该模型包含了 WGBS 甲基化读取中所有可用的信息，即使在应用于具有低覆盖度的单个 WGBS 样本时也能产生准确的结果。我们的方法使用香农熵，在全基因组范围内对单个 WGBS 样本中的甲基化随机性进行了严格的量化。此外，它利用 Jensen-Shannon 距离来评估测试样本和参考样本之间的甲基化分布差异。使用模拟和真实人类肺部正常/癌症数据进行的差异性能评估表明，我们的方法明显优于最近提出的用于 WGBS 数据分析的 DSS 方法。至关重要的是，这些结果表明，当数据中存在相关性时，边际方法在统计学上变得无效。

结论

本研究证明了使用统计物理学 1D Ising 模型对甲基化联合概率分布进行建模以及使用信息论概念对甲基化随机性进行量化的明显优势。通过采用这种方法，可以通过有效地利用 WGBS 数据中大量的统计信息，大大提高 DNA 甲基化分析的性能，而这在很大程度上被现有的方法所忽略。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/10f7/5842653/f50278fd3437/12859_2018_2086_Fig1_HTML.jpg

相似文献

An information-theoretic approach to the modeling and analysis of whole-genome bisulfite sequencing data.

BMC Bioinformatics. 2018 Mar 7;19(1):87. doi: 10.1186/s12859-018-2086-5.

Resolution of the DNA methylation state of single CpG dyads using in silico strand annealing and WGBS data.

Nat Protoc. 2019 Jan;14(1):202-216. doi: 10.1038/s41596-018-0090-x.

Guidelines for whole genome bisulphite sequencing of intact and FFPET DNA on the Illumina HiSeq X Ten.

Epigenetics Chromatin. 2018 May 28;11(1):24. doi: 10.1186/s13072-018-0194-0.

Estimating Global Methylation and Erasure Using Low-Coverage Whole-Genome Bisulfite Sequencing (WGBS ).

Methods Mol Biol. 2021;2272:29-44. doi: 10.1007/978-1-0716-1294-1_3.

Genome-Wide Analysis of DNA Methylation in Hematopoietic Cells: DNA Methylation Analysis by WGBS.

Methods Mol Biol. 2017;1633:137-149. doi: 10.1007/978-1-4939-7142-8_9.

Approaches for the Analysis and Interpretation of Whole-Genome Bisulfite Sequencing Data.

Methods Mol Biol. 2024;2842:391-403. doi: 10.1007/978-1-0716-4051-7_20.

Potential energy landscapes identify the information-theoretic nature of the epigenome.

Nat Genet. 2017 May;49(5):719-729. doi: 10.1038/ng.3811. Epub 2017 Mar 27.

Systematic evaluation of library preparation methods and sequencing platforms for high-throughput whole genome bisulfite sequencing.

Sci Rep. 2019 Jul 17;9(1):10383. doi: 10.1038/s41598-019-46875-5.

DMAP2: A Pipeline for Analysis of Whole-Genome-Scale DNA Methylation Sequencing Data.

Curr Protoc. 2024 Sep;4(9):e70003. doi: 10.1002/cpz1.70003.

DNA methylation estimation using methylation-sensitive restriction enzyme bisulfite sequencing (MREBS).

PLoS One. 2019 Apr 4;14(4):e0214368. doi: 10.1371/journal.pone.0214368. eCollection 2019.

引用本文的文献

Nrf1 acts as a highly-conserved determinon for maintaining robust redox homeostasis in the eco-evo-devo process of life histories.

Cell Stress. 2025 Jul 7;9:65-142. doi: 10.15698/cst2025.07.306. eCollection 2025.

DNA Methylation, Aging, and Cancer.

Epigenomes. 2025 Jun 3;9(2):18. doi: 10.3390/epigenomes9020018.

DNA methylation landscapes in DIPG reveal methylome variability that can be modified pharmacologically.

Neurooncol Adv. 2024 Feb 19;6(1):vdae023. doi: 10.1093/noajnl/vdae023. eCollection 2024 Jan-Dec.

Loss of epigenetic suppression of retrotransposons with oncogenic potential in aging mammary luminal epithelial cells.

Genome Res. 2023 Aug;33(8):1229-1241. doi: 10.1101/gr.277511.122. Epub 2023 Jul 18.

Comprehensive DNA Methylation Analysis Indicates That Pancreatic Intraepithelial Neoplasia Lesions Are Acinar-Derived and Epigenetically Primed for Carcinogenesis.

Cancer Res. 2023 Jun 2;83(11):1905-1916. doi: 10.1158/0008-5472.CAN-22-4052.

DNA methylation entropy as a measure of stem cell replication and aging.

Genome Biol. 2023 Feb 16;24(1):27. doi: 10.1186/s13059-023-02866-4.

MC profiling: a novel approach to analyze DNA methylation heterogeneity in genome-wide bisulfite sequencing data.

NAR Genom Bioinform. 2022 Dec 31;4(4):lqac096. doi: 10.1093/nargab/lqac096. eCollection 2022 Dec.

Donor T cell DNMT3a regulates alloreactivity in mouse models of hematopoietic stem cell transplantation.

J Clin Invest. 2022 Jul 1;132(13). doi: 10.1172/JCI158047.

Cluster mean-field theory accurately predicts statistical properties of large-scale DNA methylation patterns.

J R Soc Interface. 2022 Jan;19(186):20210707. doi: 10.1098/rsif.2021.0707. Epub 2022 Jan 26.

Estimating DNA methylation potential energy landscapes from nanopore sequencing data.

Sci Rep. 2021 Nov 3;11(1):21619. doi: 10.1038/s41598-021-00781-x.

本文引用的文献

Impact of DLK1-DIO3 imprinted cluster hypomethylation in smoker patients with lung cancer.

Oncotarget. 2016 Jul 15;9(4):4395-4410. doi: 10.18632/oncotarget.10611. eCollection 2018 Jan 12.

Neurons generated from carcinoma stem cells support cancer progression.

Signal Transduct Target Ther. 2017 Jan 6;2:16036. doi: 10.1038/sigtrans.2016.36. eCollection 2017.

Potential energy landscapes identify the information-theoretic nature of the epigenome.

Nat Genet. 2017 May;49(5):719-729. doi: 10.1038/ng.3811. Epub 2017 Mar 27.

DM-BLD: differential methylation detection using a hierarchical Bayesian model exploiting local dependency.

Bioinformatics. 2017 Jan 15;33(2):161-168. doi: 10.1093/bioinformatics/btw596. Epub 2016 Sep 11.

Detection of differentially methylated regions in whole genome bisulfite sequencing data using local Getis-Ord statistics.

Bioinformatics. 2016 Nov 15;32(22):3396-3404. doi: 10.1093/bioinformatics/btw497. Epub 2016 Aug 4.

D3M: detection of differential distributions of methylation levels.

Bioinformatics. 2016 Aug 1;32(15):2248-55. doi: 10.1093/bioinformatics/btw138. Epub 2016 Mar 11.

Differential methylation analysis for BS-seq data under general experimental design.

Bioinformatics. 2016 May 15;32(10):1446-53. doi: 10.1093/bioinformatics/btw026. Epub 2016 Jan 27.

metilene: fast and sensitive calling of differentially methylated regions from bisulfite sequencing data.

Genome Res. 2016 Feb;26(2):256-62. doi: 10.1101/gr.196394.115. Epub 2015 Dec 2.

Detection of differentially methylated regions from whole-genome bisulfite sequencing data without replicates.

Nucleic Acids Res. 2015 Dec 2;43(21):e141. doi: 10.1093/nar/gkv715. Epub 2015 Jul 15.

Estimation of the methylation pattern distribution from deep sequencing data.

BMC Bioinformatics. 2015 May 6;16:145. doi: 10.1186/s12859-015-0600-6.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于信息论的全基因组亚硫酸氢盐测序数据分析建模方法。

An information-theoretic approach to the modeling and analysis of whole-genome bisulfite sequencing data.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献