Suppr超能文献

重计数甲基化可实现对公共血液DNA甲基化阵列数据的灵活分析。

recountmethylation enables flexible analysis of public blood DNA methylation array data.

作者信息

Maden Sean K, Walsh Brian, Ellrott Kyle, Hansen Kasper D, Thompson Reid F, Nellore Abhinav

机构信息

Computational Biology Program, Oregon Health & Science University, Portland, OR 97239, USA.

Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR 97239, USA.

出版信息

Bioinform Adv. 2023 Feb 20;3(1):vbad020. doi: 10.1093/bioadv/vbad020. eCollection 2023.

Abstract

SUMMARY

Thousands of DNA methylation (DNAm) array samples from human blood are publicly available on the Gene Expression Omnibus (GEO), but they remain underutilized for experiment planning, replication and cross-study and cross-platform analyses. To facilitate these tasks, we augmented our recountmethylation R/Bioconductor package with 12 537 uniformly processed EPIC and HM450K blood samples on GEO as well as several new features. We subsequently used our updated package in several illustrative analyses, finding (i) study ID bias adjustment increased variation explained by biological and demographic variables, (ii) most variation in autosomal DNAm was explained by genetic ancestry and CD4+ T-cell fractions and (iii) the dependence of power to detect differential methylation on sample size was similar for each of peripheral blood mononuclear cells (PBMC), whole blood and umbilical cord blood. Finally, we used PBMC and whole blood to perform independent validations, and we recovered 38-46% of differentially methylated probes between sexes from two previously published epigenome-wide association studies.

AVAILABILITY AND IMPLEMENTATION

Source code to reproduce the main results are available on GitHub (repo: recountmethylation_flexible-blood-analysis_manuscript; url: https://github.com/metamaden/recountmethylation_flexible-blood-analysis_manuscript). All data was publicly available and downloaded from the Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/). Compilations of the analyzed public data can be accessed from the website recount.bio/data (preprocessed HM450K array data: https://recount.bio/data/remethdb_h5se-gm_epic_0-0-2_1589820348/; preprocessed EPIC array data: https://recount.bio/data/remethdb_h5se-gm_epic_0-0-2_1589820348/).

SUPPLEMENTARY INFORMATION

Supplementary data are available at online.

摘要

摘要

来自人类血液的数千个DNA甲基化(DNAm)阵列样本在基因表达综合数据库(GEO)上是公开可用的,但它们在实验规划、复制以及跨研究和跨平台分析中仍未得到充分利用。为便于开展这些任务,我们用GEO上12537个经过统一处理的EPIC和HM450K血液样本以及若干新功能增强了我们的recountmethylation R/Bioconductor软件包。随后,我们在若干说明性分析中使用了更新后的软件包,发现(i)研究ID偏差调整增加了由生物学和人口统计学变量解释的变异,(ii)常染色体DNAm中的大部分变异由遗传血统和CD4 + T细胞比例解释,并且(iii)检测差异甲基化的功效对样本量的依赖性在每个外周血单核细胞(PBMC)、全血和脐带血中相似。最后,我们使用PBMC和全血进行独立验证,并且我们从两项先前发表的全表观基因组关联研究中找回了38 - 46%的性别间差异甲基化探针。

可用性与实现

重现主要结果的源代码可在GitHub上获取(仓库:recountmethylation_flexible - blood - analysis_manuscript;网址:https://github.com/metamaden/recountmethylation_flexible - blood - analysis_manuscript)。所有数据均公开可用并从基因表达综合数据库(https://www.ncbi.nlm.nih.gov/geo/)下载。分析的公共数据汇编可从网站recount.bio/data访问(预处理的HM450K阵列数据:https://recount.bio/data/remethdb_h5se - gm_epic_0 - 0 - 2_1589820348/;预处理的EPIC阵列数据:https://recount.bio/data/remethdb_h5se - gm_epic_0 - 0 - 2_1589820348/)。

补充信息

补充数据可在网上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f430/9976962/b1198528e439/vbad020f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验