利用表观遗传不一致的信息论度量对基因组特征进行排名。

Ranking genomic features using an information-theoretic measure of epigenetic discordance.

机构信息

Whitaker Biomedical Engineering Institute, Johns Hopkins University, Baltimore, MD, USA.

Center for Epigenetics, Johns Hopkins School of Medicine, Baltimore, MD, USA.

出版信息

BMC Bioinformatics. 2019 Apr 8;20(1):175. doi: 10.1186/s12859-019-2777-6.

DOI:10.1186/s12859-019-2777-6

PMID:30961526

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6454630/

Abstract

BACKGROUND

Establishment and maintenance of DNA methylation throughout the genome is an important epigenetic mechanism that regulates gene expression whose disruption has been implicated in human diseases like cancer. It is therefore crucial to know which genes, or other genomic features of interest, exhibit significant discordance in DNA methylation between two phenotypes. We have previously proposed an approach for ranking genes based on methylation discordance within their promoter regions, determined by centering a window of fixed size at their transcription start sites. However, we cannot use this method to identify statistically significant genomic features and handle features of variable length and with missing data.

RESULTS

We present a new approach for computing the statistical significance of methylation discordance within genomic features of interest in single and multiple test/reference studies. We base the proposed method on a well-articulated hypothesis testing problem that produces p- and q-values for each genomic feature, which we then use to identify and rank features based on the statistical significance of their epigenetic dysregulation. We employ the information-theoretic concept of mutual information to derive a novel test statistic, which we can evaluate by computing Jensen-Shannon distances between the probability distributions of methylation in a test and a reference sample. We design the proposed methodology to simultaneously handle biological, statistical, and technical variability in the data, as well as variable feature lengths and missing data, thus enabling its wide-spread use on any list of genomic features. This is accomplished by estimating, from reference data, the null distribution of the test statistic as a function of feature length using generalized additive regression models. Differential assessment, using normal/cancer data from healthy fetal tissue and pediatric high-grade glioma patients, illustrates the potential of our approach to greatly facilitate the exploratory phases of clinically and biologically relevant methylation studies.

CONCLUSIONS

The proposed approach provides the first computational tool for statistically testing and ranking genomic features of interest based on observed DNA methylation discordance in comparative studies that accounts, in a rigorous manner, for biological, statistical, and technical variability in methylation data, as well as for variability in feature length and for missing data.

摘要

背景

在整个基因组中建立和维持 DNA 甲基化是一种重要的表观遗传机制，它调节基因表达，其破坏与癌症等人类疾病有关。因此，了解哪些基因或其他感兴趣的基因组特征在两种表型之间的 DNA 甲基化中存在显著差异是至关重要的。我们之前提出了一种基于启动子区域内甲基化差异对基因进行排序的方法，方法是在转录起始位点固定大小的窗口内对其进行中心化。然而，我们不能使用这种方法来识别具有统计学意义的基因组特征，也不能处理长度可变且存在缺失数据的特征。

结果

我们提出了一种新的方法，用于计算单测和多测/参考研究中感兴趣的基因组特征内甲基化差异的统计显著性。我们的方法基于一个精心阐述的假设检验问题，该问题为每个基因组特征生成 p 值和 q 值，然后我们使用这些 p 值和 q 值来根据其表观遗传失调的统计显著性来识别和排序特征。我们利用信息论中互信息的概念来推导出一个新的检验统计量，我们可以通过计算测试和参考样本中甲基化概率分布之间的 Jensen-Shannon 距离来评估这个检验统计量。我们设计了这个方法，以便同时处理数据中的生物学、统计学和技术变异性，以及特征长度的可变性和缺失数据，从而使其能够广泛应用于任何基因组特征列表。这是通过使用广义加性回归模型，从参考数据中估计检验统计量的特征长度的函数的零分布来实现的。使用来自健康胎儿组织和儿科高级别神经胶质瘤患者的正常/癌症数据进行的差异评估说明了我们的方法的潜力，该方法可以极大地促进具有临床和生物学意义的甲基化研究的探索阶段。

结论

该方法提供了第一个计算工具，用于在比较研究中基于观察到的 DNA 甲基化差异对感兴趣的基因组特征进行统计检验和排序，该方法严格考虑了甲基化数据中的生物学、统计学和技术变异性，以及特征长度的可变性和缺失数据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5e6e/6454630/dbf04405772c/12859_2019_2777_Fig1_HTML.jpg

相似文献

Ranking genomic features using an information-theoretic measure of epigenetic discordance.

BMC Bioinformatics. 2019 Apr 8;20(1):175. doi: 10.1186/s12859-019-2777-6.

An information-theoretic approach to the modeling and analysis of whole-genome bisulfite sequencing data.

BMC Bioinformatics. 2018 Mar 7;19(1):87. doi: 10.1186/s12859-018-2086-5.

Potential energy landscapes identify the information-theoretic nature of the epigenome.

Nat Genet. 2017 May;49(5):719-729. doi: 10.1038/ng.3811. Epub 2017 Mar 27.

Using epigenomics data to predict gene expression in lung cancer.

BMC Bioinformatics. 2015;16 Suppl 5(Suppl 5):S10. doi: 10.1186/1471-2105-16-S5-S10. Epub 2015 Mar 18.

Large-scale comparative epigenomics reveals hierarchical regulation of non-CG methylation in .

Proc Natl Acad Sci U S A. 2018 Jan 30;115(5):E1069-E1074. doi: 10.1073/pnas.1716300115. Epub 2018 Jan 16.

Identifying Epigenetic Biomarkers using Maximal Relevance and Minimal Redundancy Based Feature Selection for Multi-Omics Data.

IEEE Trans Nanobioscience. 2017 Jan;16(1):3-10. doi: 10.1109/TNB.2017.2650217. Epub 2017 Jan 9.

Differentially Methylated Genomic Regions in Birth-Weight Discordant Twin Pairs.

Ann Hum Genet. 2016 Mar;80(2):81-7. doi: 10.1111/ahg.12146. Epub 2016 Feb 1.

Redundancy analysis allows improved detection of methylation changes in large genomic regions.

BMC Bioinformatics. 2017 Dec 14;18(1):553. doi: 10.1186/s12859-017-1986-0.

Chromatin modifications and genomic contexts linked to dynamic DNA methylation patterns across human cell types.

Sci Rep. 2015 Feb 12;5:8410. doi: 10.1038/srep08410.

Tumor purity and differential methylation in cancer epigenomics.

Brief Funct Genomics. 2016 Nov;15(6):408-419. doi: 10.1093/bfgp/elw016. Epub 2016 May 19.

引用本文的文献

DNA methylation landscapes in DIPG reveal methylome variability that can be modified pharmacologically.

Neurooncol Adv. 2024 Feb 19;6(1):vdae023. doi: 10.1093/noajnl/vdae023. eCollection 2024 Jan-Dec.

Comprehensive DNA Methylation Analysis Indicates That Pancreatic Intraepithelial Neoplasia Lesions Are Acinar-Derived and Epigenetically Primed for Carcinogenesis.

Cancer Res. 2023 Jun 2;83(11):1905-1916. doi: 10.1158/0008-5472.CAN-22-4052.

Epigenetics as a mediator of plasticity in cancer.

Science. 2023 Feb 10;379(6632):eaaw3835. doi: 10.1126/science.aaw3835.

Estimating DNA methylation potential energy landscapes from nanopore sequencing data.

Sci Rep. 2021 Nov 3;11(1):21619. doi: 10.1038/s41598-021-00781-x.

Converging genetic and epigenetic drivers of paediatric acute lymphoblastic leukaemia identified by an information-theoretic analysis.

Nat Biomed Eng. 2021 Apr;5(4):360-376. doi: 10.1038/s41551-021-00703-2. Epub 2021 Apr 15.

Detection of haplotype-dependent allele-specific DNA methylation in WGBS data.

Nat Commun. 2020 Oct 16;11(1):5238. doi: 10.1038/s41467-020-19077-1.

A Dysregulated DNA Methylation Landscape Linked to Gene Expression in MLL-Rearranged AML.

Epigenetics. 2020 Aug;15(8):841-858. doi: 10.1080/15592294.2020.1734149. Epub 2020 Feb 29.

本文引用的文献

Developmental and oncogenic programs in H3K27M gliomas dissected by single-cell RNA-seq.

Science. 2018 Apr 20;360(6386):331-335. doi: 10.1126/science.aao4750.

The Key Role of Epigenetics in Human Disease Prevention and Mitigation.

N Engl J Med. 2018 Apr 5;378(14):1323-1334. doi: 10.1056/NEJMra1402513.

An information-theoretic approach to the modeling and analysis of whole-genome bisulfite sequencing data.

BMC Bioinformatics. 2018 Mar 7;19(1):87. doi: 10.1186/s12859-018-2086-5.

Homeobox B3 promotes tumor cell proliferation and invasion in glioblastoma.

Oncol Lett. 2018 Mar;15(3):3712-3718. doi: 10.3892/ol.2018.7750. Epub 2018 Jan 8.

SOX9-PDK1 axis is essential for glioma stem cell self-renewal and temozolomide resistance.

Oncotarget. 2017 Nov 30;9(1):192-204. doi: 10.18632/oncotarget.22773. eCollection 2018 Jan 2.

Acid ceramidase and its inhibitors: a drug target and a new class of drugs for killing glioblastoma cancer stem cells with high efficiency.

Oncotarget. 2017 Nov 7;8(68):112662-112674. doi: 10.18632/oncotarget.22637. eCollection 2017 Dec 22.

Statistical and integrative system-level analysis of DNA methylation data.

Nat Rev Genet. 2018 Mar;19(3):129-147. doi: 10.1038/nrg.2017.86. Epub 2017 Nov 13.

Identification of DNA methylation associated gene signatures in endometrial cancer via integrated analysis of DNA methylation and gene expression systematically.

J Gynecol Oncol. 2017 Nov;28(6):e83. doi: 10.3802/jgo.2017.28.e83.

PCDH8 inhibits glioma cell proliferation by negatively regulating the AKT/GSK3β/β-catenin signaling pathway.

Oncol Lett. 2017 Sep;14(3):3357-3362. doi: 10.3892/ol.2017.6629. Epub 2017 Jul 20.

Distinct molecular profile of diffuse cerebellar gliomas.

Acta Neuropathol. 2017 Dec;134(6):941-956. doi: 10.1007/s00401-017-1771-1. Epub 2017 Aug 29.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用表观遗传不一致的信息论度量对基因组特征进行排名。

Ranking genomic features using an information-theoretic measure of epigenetic discordance.

机构信息

Whitaker Biomedical Engineering Institute, Johns Hopkins University, Baltimore, MD, USA.

Center for Epigenetics, Johns Hopkins School of Medicine, Baltimore, MD, USA.