Center for Biomarker Research & Personalized Medicine, School of Pharmacy, Virginia Commonwealth University, Richmond, VA 23298, USA.
Epigenomics. 2012 Dec;4(6):605-21. doi: 10.2217/epi.12.59.
We studied the use of methyl-CpG binding domain (MBD) protein-enriched genome sequencing (MBD-seq) as a cost-effective screening tool for methylome-wide association studies (MWAS).
MATERIALS & METHODS: Because MBD-seq has not yet been applied on a large scale, we first developed and tested a pipeline for data processing using 1500 schizophrenia cases and controls plus 75 technical replicates with an average of 68 million reads per sample. This involved the use of technical replicates to optimize quality control for multi- and duplicate-reads, an in silico experiment to identify CpGs in loci with alignment problems, CpG coverage calculations based on multiparametric estimates of the fragment size distribution, a two-stage adaptive algorithm to combine data from correlated adjacent CpG sites, principal component analyses to control for confounders and new software tailored to handle the large data set.
We replicated MWAS findings in independent samples using a different technology that provided single base resolution. In an MWAS of age-related methylation changes, one of our top findings was a previously reported robust association involving GRIA2. Our results also suggested that owing to the many confounding effects, a considerable challenge in MWAS is to identify those effects that are informative about disease processes.
This study showed the potential of MBD-seq as a cost-effective tool in large-scale disease studies.
我们研究了甲基化 CpG 结合域(MBD)蛋白富集基因组测序(MBD-seq)作为全基因组甲基化关联研究(MWAS)的一种具有成本效益的筛选工具的应用。
由于 MBD-seq 尚未大规模应用,我们首先开发并测试了一种使用 1500 例精神分裂症病例和对照以及 75 个技术重复的样本,平均每个样本有 6800 万条reads 的数据分析处理管道。这涉及使用技术重复来优化多和重复reads 的质量控制、一项针对存在对齐问题的基因座中的 CpG 进行的虚拟实验、基于片段大小分布的多参数估计的 CpG 覆盖度计算、用于合并相关相邻 CpG 位点数据的两阶段自适应算法、主成分分析以控制混杂因素以及专门用于处理大数据集的新软件。
我们使用提供单碱基分辨率的不同技术在独立样本中复制了 MWAS 发现。在一项与年龄相关的甲基化变化的 MWAS 中,我们的一个主要发现是先前报道的涉及 GRIA2 的稳健关联。我们的结果还表明,由于存在许多混杂效应,MWAS 的一个主要挑战是识别那些与疾病过程相关的信息丰富的效应。
本研究表明 MBD-seq 作为一种在大规模疾病研究中具有成本效益的工具具有潜力。