Fu Audrey Qiuyan, Genereux Diane P, Stöger Reinhard, Laird Charles D, Stephens Matthew
University of Washington.
Ann Appl Stat. 2010;4(2):871-892. doi: 10.1214/09-AOAS297SUPPA.
We develop Bayesian inference methods for a recently-emerging type of epigenetic data to study the transmission fidelity of DNA methylation patterns over cell divisions. The data consist of parent-daughter double-stranded DNA methylation patterns with each pattern coming from a single cell and represented as an unordered pair of binary strings. The data are technically difficult and time-consuming to collect, putting a premium on an efficient inference method. Our aim is to estimate rates for the maintenance and de novo methylation events that gave rise to the observed patterns, while accounting for measurement error. We model data at multiple sites jointly, thus using whole-strand information, and considerably reduce confounding between parameters. We also adopt a hierarchical structure that allows for variation in rates across sites without an explosion in the effective number of parameters. Our context-specific priors capture the expected stationarity, or near-stationarity, of the stochastic process that generated the data analyzed here. This expected stationarity is shown to greatly increase the precision of the estimation. Applying our model to a data set collected at the human FMR1 locus, we find that measurement errors, generally ignored in similar studies, occur at a non-trivial rate (inappropriate bisulfite conversion error: 1.6% with 80% CI: 0.9-2.3%). Accounting for these errors has a substantial impact on estimates of key biological parameters. The estimated average failure of maintenance rate and daughter de novo rate decline from 0.04 to 0.024 and from 0.14 to 0.07, respectively, when errors are accounted for. Our results also provide evidence that de novo events may occur on both parent and daughter strands: the median parent and daughter de novo rates are 0.08 (80% CI: 0.04-0.13) and 0.07 (80% CI: 0.04-0.11), respectively.
我们针对一种最近出现的表观遗传数据开发了贝叶斯推理方法,以研究DNA甲基化模式在细胞分裂过程中的传递保真度。这些数据由亲代 - 子代双链DNA甲基化模式组成,每个模式来自单个细胞,并表示为一对无序的二进制字符串。收集这些数据在技术上既困难又耗时,因此高效的推理方法至关重要。我们的目标是估计导致观察到的模式的维持甲基化和从头甲基化事件的发生率,同时考虑测量误差。我们对多个位点的数据进行联合建模,从而利用整条链的信息,并大大减少参数之间的混淆。我们还采用了分层结构,允许位点间的发生率有所变化,而不会导致有效参数数量激增。我们针对特定背景的先验分布捕捉了生成此处分析数据的随机过程的预期平稳性或近似平稳性。事实证明,这种预期的平稳性大大提高了估计的精度。将我们的模型应用于在人类FMR1基因座收集的数据集时,我们发现测量误差(在类似研究中通常被忽略)以不可忽视的比例出现(亚硫酸氢盐转化错误:1.6%,80%置信区间:0.9 - 2.3%)。考虑这些误差对关键生物学参数的估计有重大影响。当考虑误差时,估计的平均维持失败率和子代从头发生率分别从0.04降至0.024,从0.14降至0.07。我们的结果还提供了证据表明从头事件可能发生在亲代链和子代链上:亲代和子代的从头发生率中位数分别为0.08(80%置信区间:0.04 - 0.13)和0.07(80%置信区间:0.04 - 0.11)。