Suppr超能文献

一种用于识别亚硫酸氢盐测序数据中差异甲基化位点的隐马尔可夫模型。

A hidden markov model for identifying differentially methylated sites in bisulfite sequencing data.

作者信息

Shokoohi Farhad, Stephens David A, Bourque Guillaume, Pastinen Tomi, Greenwood Celia M T, Labbe Aurélie

机构信息

Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Quebec, Canada.

Department of Mathematics and Statistics, McGill University, Montreal, Quebec, Canada.

出版信息

Biometrics. 2019 Mar;75(1):210-221. doi: 10.1111/biom.12965. Epub 2018 Oct 9.

Abstract

DNA methylation studies have enabled researchers to understand methylation patterns and their regulatory roles in biological processes and disease. However, only a limited number of statistical approaches have been developed to provide formal quantitative analysis. Specifically, a few available methods do identify differentially methylated CpG (DMC) sites or regions (DMR), but they suffer from limitations that arise mostly due to challenges inherent in bisulfite sequencing data. These challenges include: (1) that read-depths vary considerably among genomic positions and are often low; (2) both methylation and autocorrelation patterns change as regions change; and (3) CpG sites are distributed unevenly. Furthermore, there are several methodological limitations: almost none of these tools is capable of comparing multiple groups and/or working with missing values, and only a few allow continuous or multiple covariates. The last of these is of great interest among researchers, as the goal is often to find which regions of the genome are associated with several exposures and traits. To tackle these issues, we have developed an efficient DMC identification method based on Hidden Markov Models (HMMs) called "DMCHMM" which is a three-step approach (model selection, prediction, testing) aiming to address the aforementioned drawbacks. Our proposed method is different from other HMM methods since it profiles methylation of each sample separately, hence exploiting inter-CpG autocorrelation within samples, and it is more flexible than previous approaches by allowing multiple hidden states. Using simulations, we show that DMCHMM has the best performance among several competing methods. An analysis of cell-separated blood methylation profiles is also provided.

摘要

DNA甲基化研究使研究人员能够了解甲基化模式及其在生物过程和疾病中的调控作用。然而,目前仅开发了有限数量的统计方法来进行正式的定量分析。具体而言,一些现有方法确实能够识别差异甲基化的CpG(DMC)位点或区域(DMR),但它们存在局限性,主要是由于亚硫酸氢盐测序数据固有的挑战所致。这些挑战包括:(1)不同基因组位置的读取深度差异很大,且往往较低;(2)甲基化和自相关模式会随着区域变化而改变;(3)CpG位点分布不均。此外,还存在一些方法学上的局限性:这些工具几乎都无法比较多个组和/或处理缺失值,只有少数工具允许使用连续或多个协变量。最后一点在研究人员中引起了极大兴趣,因为目标通常是找出基因组的哪些区域与多种暴露和性状相关。为了解决这些问题,我们开发了一种基于隐马尔可夫模型(HMM)的高效DMC识别方法,称为“DMCHMM”,它采用三步法(模型选择、预测、测试),旨在解决上述缺点。我们提出的方法与其他HMM方法不同,因为它分别对每个样本的甲基化进行分析,从而利用样本内CpG之间的自相关性,并且通过允许多个隐藏状态,比以前的方法更灵活。通过模拟,我们表明DMCHMM在几种竞争方法中具有最佳性能。本文还提供了对细胞分离的血液甲基化谱的分析。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验