Äijö Tarmo, Yue Xiaojing, Rao Anjana, Lähdesmäki Harri
Center for Computational Biology, Simons Foundation, New York, NY 10010, USA Department of Computer Science, Aalto University School of Science, Aalto FI-00076, Finland.
La Jolla Institute for Allergy and Immunology, La Jolla, CA 92037, USA.
Bioinformatics. 2016 Sep 1;32(17):i511-i519. doi: 10.1093/bioinformatics/btw468.
5-methylcytosine (5mC) is a widely studied epigenetic modification of DNA. The ten-eleven translocation (TET) dioxygenases oxidize 5mC into oxidized methylcytosines (oxi-mCs): 5-hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC). DNA methylation modifications have multiple functions. For example, 5mC is shown to be associated with diseases and oxi-mC species are reported to have a role in active DNA demethylation through 5mC oxidation and DNA repair, among others, but the detailed mechanisms are poorly understood. Bisulphite sequencing and its various derivatives can be used to gain information about all methylation modifications at single nucleotide resolution. Analysis of bisulphite based sequencing data is complicated due to the convoluted read-outs and experiment-specific variation in biochemistry. Moreover, statistical analysis is often complicated by various confounding effects. How to analyse 5mC and oxi-mC data sets with arbitrary and complex experimental designs is an open and important problem.
We propose the first method to quantify oxi-mC species with arbitrary covariate structures from bisulphite based sequencing data. Our probabilistic modeling framework combines a previously proposed hierarchical generative model for oxi-mC-seq data and a general linear model component to account for confounding effects. We show that our method provides accurate methylation level estimates and accurate detection of differential methylation when compared with existing methods. Analysis of novel and published data gave insights into to the demethylation of the forkhead box P3 (Foxp3) locus during the induced T regulatory cell differentiation. We also demonstrate how our covariate model accurately predicts methylation levels of the Foxp3 locus. Collectively, LuxGLM method improves the analysis of DNA methylation modifications, particularly for oxi-mC species.
An implementation of the proposed method is available under MIT license at https://github.org/tare/LuxGLM/ CONTACT: taijo@simonsfoundation.org or harri.lahdesmaki@aalto.fi
Supplementary data are available at Bioinformatics online.
5-甲基胞嘧啶(5mC)是一种被广泛研究的DNA表观遗传修饰。双加氧酶(TET)可将5mC氧化为氧化甲基胞嘧啶(oxi-mC):5-羟甲基胞嘧啶(5hmC)、5-甲酰基胞嘧啶(5fC)和5-羧基胞嘧啶(5caC)。DNA甲基化修饰具有多种功能。例如,5mC已被证明与疾病有关,据报道oxi-mC物种通过5mC氧化和DNA修复等在主动DNA去甲基化中发挥作用,但具体机制尚不清楚。亚硫酸氢盐测序及其各种衍生方法可用于在单核苷酸分辨率下获取所有甲基化修饰的信息。由于复杂的读数和生物化学中特定实验的变化,基于亚硫酸氢盐测序数据的分析很复杂。此外,统计分析常常因各种混杂效应而变得复杂。如何分析具有任意和复杂实验设计的5mC和oxi-mC数据集是一个开放且重要的问题。
我们提出了第一种从基于亚硫酸氢盐的测序数据中量化具有任意协变量结构的oxi-mC物种的方法。我们的概率建模框架结合了先前提出的用于oxi-mC-seq数据的分层生成模型和一个通用线性模型组件,以考虑混杂效应。我们表明,与现有方法相比,我们方法能提供准确的甲基化水平估计和准确的差异甲基化检测。对新数据和已发表数据的分析深入了解了诱导性调节性T细胞分化过程中叉头框P3(Foxp3)位点的去甲基化情况。我们还展示了我们的协变量模型如何准确预测Foxp3位点的甲基化水平。总体而言,LuxGLM方法改进了DNA甲基化修饰的分析,特别是对于oxi-mC物种。
所提出方法的实现可在https://github.org/tare/LuxGLM/ 下根据MIT许可获得。联系方式:taijo@simonsfoundation.org 或harri.lahdesmaki@aalto.fi
补充数据可在《生物信息学》在线获取。