Suppr超能文献

一种用于在亚硫酸氢盐测序数据中识别差异DNA甲基化的灵活、高效二项混合模型

A Flexible, Efficient Binomial Mixed Model for Identifying Differential DNA Methylation in Bisulfite Sequencing Data.

作者信息

Lea Amanda J, Tung Jenny, Zhou Xiang

机构信息

Department of Biology, Duke University, Durham, North Carolina, United States of America.

Institute of Primate Research, National Museums of Kenya, Karen, Nairobi, Kenya.

出版信息

PLoS Genet. 2015 Nov 24;11(11):e1005650. doi: 10.1371/journal.pgen.1005650. eCollection 2015 Nov.

Abstract

Identifying sources of variation in DNA methylation levels is important for understanding gene regulation. Recently, bisulfite sequencing has become a popular tool for investigating DNA methylation levels. However, modeling bisulfite sequencing data is complicated by dramatic variation in coverage across sites and individual samples, and because of the computational challenges of controlling for genetic covariance in count data. To address these challenges, we present a binomial mixed model and an efficient, sampling-based algorithm (MACAU: Mixed model association for count data via data augmentation) for approximate parameter estimation and p-value computation. This framework allows us to simultaneously account for both the over-dispersed, count-based nature of bisulfite sequencing data, as well as genetic relatedness among individuals. Using simulations and two real data sets (whole genome bisulfite sequencing (WGBS) data from Arabidopsis thaliana and reduced representation bisulfite sequencing (RRBS) data from baboons), we show that our method provides well-calibrated test statistics in the presence of population structure. Further, it improves power to detect differentially methylated sites: in the RRBS data set, MACAU detected 1.6-fold more age-associated CpG sites than a beta-binomial model (the next best approach). Changes in these sites are consistent with known age-related shifts in DNA methylation levels, and are enriched near genes that are differentially expressed with age in the same population. Taken together, our results indicate that MACAU is an efficient, effective tool for analyzing bisulfite sequencing data, with particular salience to analyses of structured populations. MACAU is freely available at www.xzlab.org/software.html.

摘要

识别DNA甲基化水平的变异来源对于理解基因调控至关重要。最近,亚硫酸氢盐测序已成为研究DNA甲基化水平的常用工具。然而,对亚硫酸氢盐测序数据进行建模很复杂,因为位点和个体样本之间的覆盖度存在巨大差异,并且由于在计数数据中控制遗传协方差存在计算挑战。为应对这些挑战,我们提出了一种二项混合模型和一种基于采样的高效算法(MACAU:通过数据增强对计数数据进行混合模型关联分析),用于近似参数估计和p值计算。该框架使我们能够同时考虑亚硫酸氢盐测序数据基于计数的过度分散特性以及个体之间的遗传相关性。通过模拟和两个真实数据集(拟南芥的全基因组亚硫酸氢盐测序(WGBS)数据和狒狒的简化代表性亚硫酸氢盐测序(RRBS)数据),我们表明我们的方法在存在群体结构的情况下提供了校准良好的检验统计量。此外,它提高了检测差异甲基化位点的能力:在RRBS数据集中,MACAU检测到的与年龄相关的CpG位点比β-二项式模型(次优方法)多1.6倍。这些位点的变化与已知的DNA甲基化水平随年龄的变化一致,并且在同一群体中随年龄差异表达的基因附近富集。综上所述,我们的结果表明MACAU是分析亚硫酸氢盐测序数据的一种高效、有效的工具,对结构化群体的分析尤为重要。MACAU可在www.xzlab.org/software.html上免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/480f/4657956/02348c2656c6/pgen.1005650.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验