Suppr超能文献

“缺口搜寻”以表征Illumina甲基化阵列数据中的聚类探针信号。

"Gap hunting" to characterize clustered probe signals in Illumina methylation array data.

作者信息

Andrews Shan V, Ladd-Acosta Christine, Feinberg Andrew P, Hansen Kasper D, Fallin M Daniele

机构信息

Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, 615 N. Wolfe Street, Baltimore, MD 21205 USA.

Wendy Klag Center for Autism and Developmental Disabilities, Johns Hopkins Bloomberg School of Public Health, 615 N. Wolfe Street, Baltimore, MD 21205 USA.

出版信息

Epigenetics Chromatin. 2016 Dec 7;9:56. doi: 10.1186/s13072-016-0107-z. eCollection 2016.

Abstract

BACKGROUND

The Illumina 450k array has been widely used in epigenetic association studies. Current quality-control (QC) pipelines typically remove certain sets of probes, such as those containing a SNP or with multiple mapping locations. An additional set of potentially problematic probes are those with DNA methylation distributions characterized by two or more distinct clusters separated by gaps. Data-driven identification of such probes may offer additional insights for downstream analyses.

RESULTS

We developed a procedure, termed "gap hunting," to identify probes showing clustered distributions. Among 590 peripheral blood samples from the Study to Explore Early Development, we identified 11,007 "gap probes." The vast majority (9199) are likely attributed to an underlying SNP(s) or other variant in the probe, although SNP-affected probes exist that do not produce a gap signals. Specific factors predict which SNPs lead to gap signals, including type of nucleotide change, probe type, DNA strand, and overall methylation state. These expected effects are demonstrated in paired genotype and 450k data on the same samples. Gap probes can also serve as a surrogate for the local genetic sequence on a haplotype scale and can be used to adjust for population stratification.

CONCLUSIONS

The characteristics of gap probes reflect potentially informative biology. QC pipelines may benefit from an efficient data-driven approach that "flags" gap probes, rather than filtering such probes, followed by careful interpretation of downstream association analyses. Our results should translate directly to the recently released Illumina EPIC array given the similar chemistry and content design.

摘要

背景

Illumina 450k芯片已广泛应用于表观遗传关联研究。当前的质量控制(QC)流程通常会去除某些探针集,例如那些包含单核苷酸多态性(SNP)或具有多个映射位置的探针。另一组潜在有问题的探针是那些DNA甲基化分布具有两个或更多由间隙分隔的不同簇特征的探针。通过数据驱动识别此类探针可能为下游分析提供更多见解。

结果

我们开发了一种称为“间隙搜寻”的程序来识别显示聚类分布的探针。在来自早期发育探索研究的590份外周血样本中,我们鉴定出11,007个“间隙探针”。绝大多数(9199个)可能归因于探针中潜在的一个或多个SNP或其他变异,尽管存在受SNP影响但不产生间隙信号的探针。特定因素可预测哪些SNP会导致间隙信号,包括核苷酸变化类型、探针类型、DNA链和整体甲基化状态。这些预期效应在相同样本的配对基因型和450k数据中得到了证明。间隙探针还可作为单倍型规模上局部遗传序列的替代物,并可用于调整群体分层。

结论

间隙探针的特征反映了潜在的信息生物学。质量控制流程可能会受益于一种有效的数据驱动方法,该方法“标记”间隙探针,而不是过滤此类探针,然后仔细解释下游关联分析。鉴于类似的化学和内容设计,我们的结果应直接适用于最近发布的Illumina EPIC芯片。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b39b/5142147/e85afea2b87e/13072_2016_107_Fig9_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验