Maden Sean K, Walsh Brian, Ellrott Kyle, Hansen Kasper D, Thompson Reid F, Nellore Abhinav
Computational Biology Program, Oregon Health & Science University, Portland, OR 97239, USA.
Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR 97239, USA.
Bioinform Adv. 2023 Feb 20;3(1):vbad020. doi: 10.1093/bioadv/vbad020. eCollection 2023.
Thousands of DNA methylation (DNAm) array samples from human blood are publicly available on the Gene Expression Omnibus (GEO), but they remain underutilized for experiment planning, replication and cross-study and cross-platform analyses. To facilitate these tasks, we augmented our recountmethylation R/Bioconductor package with 12 537 uniformly processed EPIC and HM450K blood samples on GEO as well as several new features. We subsequently used our updated package in several illustrative analyses, finding (i) study ID bias adjustment increased variation explained by biological and demographic variables, (ii) most variation in autosomal DNAm was explained by genetic ancestry and CD4+ T-cell fractions and (iii) the dependence of power to detect differential methylation on sample size was similar for each of peripheral blood mononuclear cells (PBMC), whole blood and umbilical cord blood. Finally, we used PBMC and whole blood to perform independent validations, and we recovered 38-46% of differentially methylated probes between sexes from two previously published epigenome-wide association studies.
Source code to reproduce the main results are available on GitHub (repo: recountmethylation_flexible-blood-analysis_manuscript; url: https://github.com/metamaden/recountmethylation_flexible-blood-analysis_manuscript). All data was publicly available and downloaded from the Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/). Compilations of the analyzed public data can be accessed from the website recount.bio/data (preprocessed HM450K array data: https://recount.bio/data/remethdb_h5se-gm_epic_0-0-2_1589820348/; preprocessed EPIC array data: https://recount.bio/data/remethdb_h5se-gm_epic_0-0-2_1589820348/).
Supplementary data are available at online.
来自人类血液的数千个DNA甲基化(DNAm)阵列样本在基因表达综合数据库(GEO)上是公开可用的,但它们在实验规划、复制以及跨研究和跨平台分析中仍未得到充分利用。为便于开展这些任务,我们用GEO上12537个经过统一处理的EPIC和HM450K血液样本以及若干新功能增强了我们的recountmethylation R/Bioconductor软件包。随后,我们在若干说明性分析中使用了更新后的软件包,发现(i)研究ID偏差调整增加了由生物学和人口统计学变量解释的变异,(ii)常染色体DNAm中的大部分变异由遗传血统和CD4 + T细胞比例解释,并且(iii)检测差异甲基化的功效对样本量的依赖性在每个外周血单核细胞(PBMC)、全血和脐带血中相似。最后,我们使用PBMC和全血进行独立验证,并且我们从两项先前发表的全表观基因组关联研究中找回了38 - 46%的性别间差异甲基化探针。
重现主要结果的源代码可在GitHub上获取(仓库:recountmethylation_flexible - blood - analysis_manuscript;网址:https://github.com/metamaden/recountmethylation_flexible - blood - analysis_manuscript)。所有数据均公开可用并从基因表达综合数据库(https://www.ncbi.nlm.nih.gov/geo/)下载。分析的公共数据汇编可从网站recount.bio/data访问(预处理的HM450K阵列数据:https://recount.bio/data/remethdb_h5se - gm_epic_0 - 0 - 2_1589820348/;预处理的EPIC阵列数据:https://recount.bio/data/remethdb_h5se - gm_epic_0 - 0 - 2_1589820348/)。
补充数据可在网上获取。