Suppr超能文献

全基因组关联研究中的可估计量。

Estimands in epigenome-wide association studies.

机构信息

Charité - University Medicine, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Institute of Biometry and Clinical Epidemiology, Charitéplatz 1, 10117, Berlin, Germany.

Berlin Institute of Health (BIH), Anna-Louisa-Karsch-Straße 2, 10178, Berlin, Germany.

出版信息

Clin Epigenetics. 2021 Apr 29;13(1):98. doi: 10.1186/s13148-021-01083-9.

Abstract

BACKGROUND

In DNA methylation analyses like epigenome-wide association studies, effects in differentially methylated CpG sites are assessed. Two kinds of outcomes can be used for statistical analysis: Beta-values and M-values. M-values follow a normal distribution and help to detect differentially methylated CpG sites. As biological effect measures, differences of M-values are more or less meaningless. Beta-values are of more interest since they can be interpreted directly as differences in percentage of DNA methylation at a given CpG site, but they have poor statistical properties. Different frameworks are proposed for reporting estimands in DNA methylation analysis, relying on Beta-values, M-values, or both.

RESULTS

We present and discuss four possible approaches of achieving estimands in DNA methylation analysis. In addition, we present the usage of M-values or Beta-values in the context of bioinformatical pipelines, which often demand a predefined outcome. We show the dependencies between the differences in M-values to differences in Beta-values in two data simulations: a analysis with and without confounder effect. Without present confounder effects, M-values can be used for the statistical analysis and Beta-values statistics for the reporting. If confounder effects exist, we demonstrate the deviations and correct the effects by the intercept method. Finally, we demonstrate the theoretical problem on two large human genome-wide DNA methylation datasets to verify the results.

CONCLUSIONS

The usage of M-values in the analysis of DNA methylation data will produce effect estimates, which cannot be biologically interpreted. The parallel usage of Beta-value statistics ignores possible confounder effects and can therefore not be recommended. Hence, if the differences in Beta-values are the focus of the study, the intercept method is recommendable. Hyper- or hypomethylated CpG sites must then be carefully evaluated. If an exploratory analysis of possible CpG sites is the aim of the study, M-values can be used for inference.

摘要

背景

在 DNA 甲基化分析(如全基因组关联研究)中,评估差异甲基化 CpG 位点的效应。可以使用两种类型的结果进行统计分析:Beta 值和 M 值。M 值遵循正态分布,有助于检测差异甲基化 CpG 位点。作为生物学效应测量,M 值的差异或多或少没有意义。Beta 值更有意义,因为它们可以直接解释为特定 CpG 位点处 DNA 甲基化百分比的差异,但它们的统计性质较差。不同的框架被提出用于报告 DNA 甲基化分析中的估计量,依赖于 Beta 值、M 值或两者。

结果

我们提出并讨论了实现 DNA 甲基化分析中估计量的四种可能方法。此外,我们还展示了在生物信息学管道中使用 M 值或 Beta 值的情况,这些管道通常需要预设的结果。我们展示了在没有和有混杂效应的两种数据模拟中,M 值差异与 Beta 值差异之间的依赖关系。没有混杂效应,M 值可用于统计分析,Beta 值统计可用于报告。如果存在混杂效应,我们将通过截距法校正这些偏差和影响。最后,我们在两个大型人类全基因组 DNA 甲基化数据集上验证了理论问题,以验证结果。

结论

在 DNA 甲基化数据分析中使用 M 值将产生无法从生物学上解释的效应估计值。同时使用 Beta 值统计忽略了可能的混杂效应,因此不推荐使用。因此,如果 Beta 值差异是研究的重点,则建议使用截距法。然后必须仔细评估超甲基化或低甲基化的 CpG 位点。如果研究的目的是对可能的 CpG 位点进行探索性分析,则可以使用 M 值进行推断。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab6b/8086103/22776c88b496/13148_2021_1083_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验