Suppr超能文献

用于协调 Illumina 的 450K 和 EPIC 平台的 DNA 甲基化数据以用于流行病学研究的有效处理管道。

An effective processing pipeline for harmonizing DNA methylation data from Illumina's 450K and EPIC platforms for epidemiological studies.

机构信息

Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.

Department of Epidemiology, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.

出版信息

BMC Res Notes. 2021 Sep 8;14(1):352. doi: 10.1186/s13104-021-05741-2.

Abstract

OBJECTIVE

Illumina BeadChip arrays are commonly used to generate DNA methylation data for large epidemiological studies. Updates in technology over time create challenges for data harmonization within and between studies, many of which obtained data from the older 450K and newer EPIC platforms. The pre-processing pipeline for DNA methylation is not trivial, and influences the downstream analyses. Incorporating different platforms adds a new level of technical variability that has not yet been taken into account by recommended pipelines. Our study evaluated the performance of various tools on different versions of platform data harmonization at each step of pre-processing pipeline, including quality control (QC), normalization, batch effect adjustment, and genomic inflation. We illustrate our novel approach using 450K and EPIC data from the Diabetes Autoimmunity Study in the Young (DAISY) prospective cohort.

RESULTS

We found normalization and probe filtering had the biggest effect on data harmonization. Employing a meta-analysis was an effective and easily executable method for accounting for platform variability. Correcting for genomic inflation also helped with harmonization. We present guidelines for studies seeking to harmonize data from the 450K and EPIC platforms, which includes the use of technical replicates for evaluating numerous pre-processing steps, and employing a meta-analysis.

摘要

目的

Illumina BeadChip 阵列常用于生成大型流行病学研究的 DNA 甲基化数据。随着时间的推移,技术的更新为研究内部和研究之间的数据协调带来了挑战,其中许多研究从较旧的 450K 和较新的 EPIC 平台获得了数据。DNA 甲基化的预处理管道并不简单,并且会影响下游分析。整合不同的平台增加了一个尚未被推荐管道考虑到的新的技术可变性层次。我们的研究评估了各种工具在预处理管道的每个步骤(包括质量控制 (QC)、标准化、批次效应调整和基因组膨胀)中对不同版本平台数据协调的性能。我们使用来自年轻糖尿病自身免疫研究 (DAISY) 前瞻性队列的 450K 和 EPIC 数据说明了我们的新方法。

结果

我们发现标准化和探针过滤对数据协调有最大的影响。采用荟萃分析是一种有效且易于执行的方法,可以解决平台变异性问题。校正基因组膨胀也有助于协调。我们为试图协调 450K 和 EPIC 平台数据的研究提供了指导方针,包括使用技术重复来评估众多预处理步骤,并采用荟萃分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b07e/8424820/6b76de7579c7/13104_2021_5741_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验