Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover, NH, USA.
Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA.
Bioinformatics. 2019 Oct 1;35(19):3635-3641. doi: 10.1093/bioinformatics/btz137.
The accumulation of publicly available DNA methylation datasets has resulted in the need for tools to interpret the specific cellular phenotypes in bulk tissue data. Current approaches use either single differentially methylated CpG sites or differentially methylated regions that map to genes. However, these approaches may introduce biases in downstream analyses of biological interpretation, because of the variability in gene length. There is a lack of approaches to interpret DNA methylation effectively. Therefore, we have developed computational models to provide biological interpretation of relevant gene sets using DNA methylation data in the context of The Cancer Genome Atlas.
We illustrate that Biological interpretation of DNA Methylation (BioMethyl) utilizes the complete DNA methylation data for a given cancer type to reflect corresponding gene expression profiles and performs pathway enrichment analyses, providing unique biological insight. Using breast cancer as an example, BioMethyl shows high consistency in the identification of enriched biological pathways from DNA methylation data compared to the results calculated from RNA sequencing data. We find that 12 out of 14 pathways identified by BioMethyl are shared with those by using RNA-seq data, with a Jaccard score 0.8 for estrogen receptor (ER) positive samples. For ER negative samples, three pathways are shared in the two enrichments with a slight lower similarity (Jaccard score = 0.6). Using BioMethyl, we can successfully identify those hidden biological pathways in DNA methylation data when gene expression profile is lacking.
BioMethyl R package is freely available in the GitHub repository (https://github.com/yuewangpanda/BioMethyl).
Supplementary data are available at Bioinformatics online.
公开可用的 DNA 甲基化数据集的积累导致需要工具来解释批量组织数据中的特定细胞表型。当前的方法使用单个差异甲基化 CpG 位点或映射到基因的差异甲基化区域。然而,由于基因长度的可变性,这些方法可能会在下游生物学解释分析中引入偏差。目前缺乏有效的解释 DNA 甲基化的方法。因此,我们开发了计算模型,以便在癌症基因组图谱的背景下使用 DNA 甲基化数据对相关基因集进行生物学解释。
我们说明了 DNA 甲基化的生物学解释(BioMethyl)利用给定癌症类型的完整 DNA 甲基化数据来反映相应的基因表达谱,并进行途径富集分析,从而提供独特的生物学见解。以乳腺癌为例,与从 RNA 测序数据计算得出的结果相比,BioMethyl 在从 DNA 甲基化数据中识别富集的生物学途径方面具有高度一致性。我们发现,BioMethyl 确定的 14 条途径中有 12 条与使用 RNA-seq 数据确定的途径相同,对于雌激素受体(ER)阳性样本,Jaccard 得分 0.8。对于 ER 阴性样本,在两个富集中共享三个途径,相似性略低(Jaccard 得分=0.6)。使用 BioMethyl,我们可以在缺乏基因表达谱的情况下成功地从 DNA 甲基化数据中识别出那些隐藏的生物学途径。
BioMethyl R 包可在 GitHub 存储库(https://github.com/yuewangpanda/BioMethyl)中免费获得。
补充数据可在生物信息学在线获得。