基于排列的显著性分析降低了人类脐带血样本亚硫酸氢盐测序数据分析中的Ⅰ类错误率。

Permutation-based significance analysis reduces the type 1 error rate in bisulphite sequencing data analysis of human umbilical cord blood samples.

机构信息

Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland.

InFLAMES Research Flagship Center, University of Turku, Turku Finland.

出版信息

Epigenetics. 2022 Dec;17(12):1608-1627. doi: 10.1080/15592294.2022.2044127. Epub 2022 Mar 4.

DOI:10.1080/15592294.2022.2044127

PMID:35246015

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9620995/

Abstract

DNA methylation patterns are largely established in-utero and might mediate the impacts of in-utero conditions on later health outcomes. Associations between perinatal DNA methylation marks and pregnancy-related variables, such as maternal age and gestational weight gain, have been earlier studied with methylation microarrays, which typically cover less than 2% of human CpG sites. To detect such associations outside these regions, we chose the bisulphite sequencing approach. We collected and curated clinical data on 200 newborn infants; whose umbilical cord blood samples were analysed with the reduced representation bisulphite sequencing (RRBS) method. A generalized linear mixed-effects model was fit for each high coverage CpG site, followed by spatial and multiple testing adjustment of P values to identify differentially methylated cytosines (DMCs) and regions (DMRs) associated with clinical variables, such as maternal age, mode of delivery, and birth weight. Type 1 error rate was then evaluated with a permutation analysis. We discovered a strong inflation of spatially adjusted P values through the permutation analysis, which we then applied for empirical type 1 error control. The inflation of P values was caused by a common method for spatial adjustment and DMR detection, implemented in tools comb-p and RADMeth. Based on empirically estimated significance thresholds, very little differential methylation was associated with any of the studied clinical variables, other than sex. With this analysis workflow, the sex-associated differentially methylated regions were highly reproducible across studies, technologies, and statistical models.

摘要

DNA 甲基化模式在很大程度上是在子宫内建立的，可能介导子宫内环境对后期健康结果的影响。先前已经使用甲基化微阵列研究了围产期 DNA 甲基化标记与妊娠相关变量（如母亲年龄和妊娠体重增加）之间的关联，这些微阵列通常覆盖不到人类 CpG 位点的 2%。为了在这些区域之外检测到这种关联，我们选择了亚硫酸氢盐测序方法。我们收集并整理了 200 名新生儿的临床数据；对其脐带血样本进行了还原代表性亚硫酸氢盐测序 (RRBS) 分析。为每个高覆盖率 CpG 位点拟合广义线性混合效应模型，然后对 P 值进行空间和多重测试调整，以识别与临床变量（如母亲年龄、分娩方式和出生体重）相关的差异甲基化胞嘧啶 (DMC) 和区域 (DMR)。然后通过置换分析评估了类型 1 错误率。我们通过置换分析发现空间调整后的 P 值存在强烈的膨胀，然后我们将其应用于经验类型 1 错误控制。P 值的膨胀是由空间调整和 DMR 检测中常用的方法引起的，该方法在 comb-p 和 RADMeth 工具中实现。基于经验估计的显着性阈值，除性别外，很少有差异甲基化与任何研究的临床变量相关。使用此分析工作流程，除性别外，与研究、技术和统计模型高度相关的性别相关差异甲基化区域具有高度的重现性。