检测甲基化数据中尖峰协变量的可疑交互作用。

Detection of suspicious interactions of spiking covariates in methylation data.

机构信息

Charité - University Medicine, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Institute of Biometry and Clinical Epidemiology, Charitéplatz 1, Berlin, 10117, Germany.

Berlin Institute of Health (BIH), Anna-Louisa-Karsch-Strane 2, Berlin, 10178, Germany.

出版信息

BMC Bioinformatics. 2020 Jan 30;21(1):36. doi: 10.1186/s12859-020-3364-6.

DOI:10.1186/s12859-020-3364-6

PMID:32000657

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6993406/

Abstract

BACKGROUND

In methylation analyses like epigenome-wide association studies, a high amount of biomarkers is tested for an association between the measured continuous outcome and different covariates. In the case of a continuous covariate like smoking pack years (SPY), a measure of lifetime exposure to tobacco toxins, a spike at zero can occur. Hence, all non-smokers are generating a peak at zero, while the smoking patients are distributed over the other SPY values. Additionally, the spike might also occur on the right side of the covariate distribution, if a category "heavy smoker" is designed. Here, we will focus on methylation data with a spike at the left or the right of the distribution of a continuous covariate. After the methylation data is generated, analysis is usually performed by preprocessing, quality control, and determination of differentially methylated sites, often performed in pipeline fashion. Hence, the data is processed in a string of methods, which are available in one software package. The pipelines can distinguish between categorical covariates, i.e. for group comparisons or continuous covariates, i.e. for linear regression. The differential methylation analysis is often done internally by a linear regression without checking its inherent assumptions. A spike in the continuous covariate is ignored and can cause biased results.

RESULTS

We have reanalysed five data sets, four freely available from ArrayExpress, including methylation data and smoking habits reported by smoking pack years. Therefore, we generated an algorithm to check for the occurrences of suspicious interactions between the values associated with the spike position and the non-spike positions of the covariate. Our algorithm helps to decide if a suspicious interaction can be found and further investigations should be carried out. This is mostly important, because the information on the differentially methylated sites will be used for post-hoc analyses like pathway analyses.

CONCLUSIONS

We help to check for the validation of the linear regression assumptions in a methylation analysis pipeline. These assumptions should also be considered for machine learning approaches. In addition, we are able to detect outliers in the continuous covariate. Therefore, more statistical robust results should be produced in methylation analysis using our algorithm as a preprocessing step.

摘要

背景

在甲基化分析中，如全基因组关联研究，需要对大量生物标志物进行测试，以确定测量的连续结果与不同协变量之间的关联。在连续协变量（如吸烟包年数 [SPY]）的情况下，这是衡量终生接触烟草毒素的指标，可能会出现零值峰值。因此，所有不吸烟者都会在零值处产生一个峰值，而吸烟患者则分布在其他 SPY 值上。此外，如果设计了“重度吸烟者”类别，则该峰值也可能出现在协变量分布的右侧。在这里，我们将重点关注甲基化数据在连续协变量分布的左侧或右侧出现峰值的情况。在生成甲基化数据后，通常通过预处理、质量控制和确定差异甲基化位点来进行分析，通常以流水线方式进行。因此，数据是在一系列方法中处理的，这些方法在一个软件包中可用。该流水线可以区分分类协变量，即用于组比较，或连续协变量，即用于线性回归。差异甲基化分析通常通过不检查其内在假设的线性回归在内部进行。连续协变量中的峰值被忽略，可能会导致有偏结果。

结果

我们重新分析了五个数据集，其中四个可从 ArrayExpress 免费获得，包括甲基化数据和按吸烟包年数报告的吸烟习惯。因此，我们生成了一种算法来检查与峰值位置相关的值与协变量非峰值位置之间可疑交互的发生情况。我们的算法有助于确定是否可以找到可疑交互，是否需要进一步进行调查。这一点非常重要，因为差异甲基化位点的信息将用于事后分析，如途径分析。

结论

我们有助于检查甲基化分析流水线中线性回归假设的有效性。这些假设也应该考虑用于机器学习方法。此外，我们能够检测到连续协变量中的异常值。因此，使用我们的算法作为预处理步骤，在甲基化分析中可以产生更稳健的统计结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4511/6993406/73b71ea92a96/12859_2020_3364_Fig1_HTML.jpg

相似文献

Detection of suspicious interactions of spiking covariates in methylation data.检测甲基化数据中尖峰协变量的可疑交互作用。

BMC Bioinformatics. 2020 Jan 30;21(1):36. doi: 10.1186/s12859-020-3364-6.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Aberrant methylation of hypermethylated-in-cancer-1 and exocyclic DNA adducts in tobacco smokers.吸烟人群中抑癌基因 1 过甲基化和环外 DNA 加合物的异常甲基化。

Toxicol Sci. 2014 Jan;137(1):47-54. doi: 10.1093/toxsci/kft241. Epub 2013 Oct 23.

Association of internal smoking dose with blood DNA methylation in three racial/ethnic populations.三种不同种族/民族人群中内吸烟量与血液 DNA 甲基化的关联。

Clin Epigenetics. 2018 Aug 23;10(1):110. doi: 10.1186/s13148-018-0543-7.

DNA methylation and smoking in Korean adults: epigenome-wide association study.韩国成年人的DNA甲基化与吸烟：全表观基因组关联研究。

Clin Epigenetics. 2016 Sep 22;8:103. doi: 10.1186/s13148-016-0266-6. eCollection 2016.

Effects of smoking on genome-wide DNA methylation profiles: A study of discordant and concordant monozygotic twin pairs.吸烟对全基因组 DNA 甲基化谱的影响：一项对不一致和一致的同卵双胞胎的研究。

Elife. 2023 Aug 10;12:e83286. doi: 10.7554/eLife.83286.

Novel epigenetic changes unveiled by monozygotic twins discordant for smoking habits.吸烟习惯不一致的同卵双胞胎揭示的新型表观遗传变化。

PLoS One. 2015 Jun 4;10(6):e0128265. doi: 10.1371/journal.pone.0128265. eCollection 2015.

Leveraging biological and statistical covariates improves the detection power in epigenome-wide association testing.利用生物学和统计学协变量可提高表观基因组关联测试中的检测能力。

Genome Biol. 2020 Apr 6;21(1):88. doi: 10.1186/s13059-020-02001-7.

Smoking and blood DNA methylation: an epigenome-wide association study and assessment of reversibility.吸烟与血液 DNA 甲基化：全基因组关联研究与可逆转性评估。

Epigenetics. 2020 Apr;15(4):358-368. doi: 10.1080/15592294.2019.1668739. Epub 2019 Sep 25.

Epigenetic Signatures of Cigarette Smoking.吸烟的表观遗传特征

Circ Cardiovasc Genet. 2016 Oct;9(5):436-447. doi: 10.1161/CIRCGENETICS.116.001506. Epub 2016 Sep 20.

本文引用的文献

State of the art in selection of variables and functional forms in multivariable analysis-outstanding issues.多变量分析中变量和函数形式选择的当前技术水平——突出问题

Diagn Progn Res. 2020 Apr 2;4:3. doi: 10.1186/s41512-020-00074-3. eCollection 2020.

A combined epigenome- and transcriptome-wide association study of the oral masticatory mucosa assigns CYP1B1 a central role for epithelial health in smokers.一项针对口腔咀嚼黏膜的表观基因组和转录组全基因组关联研究将 CYP1B1 确定为吸烟人群中上皮健康的核心因素。

Clin Epigenetics. 2019 Jul 22;11(1):105. doi: 10.1186/s13148-019-0697-y.

Two-Part Models and Quantile Regression for the Analysis of Survey Data With a Spike. The Example of Satisfaction With Health Care.用于分析带尖峰调查数据的两部分模型和分位数回归。以医疗保健满意度为例。

Front Public Health. 2019 Jun 11;7:146. doi: 10.3389/fpubh.2019.00146. eCollection 2019.

Elevated levels of eEF1A2 protein expression in triple negative breast cancer relate with poor prognosis.eEF1A2 蛋白表达水平升高与三阴性乳腺癌预后不良相关。

PLoS One. 2019 Jun 20;14(6):e0218030. doi: 10.1371/journal.pone.0218030. eCollection 2019.

Rejoinder to statistical contributions to bioinformatics: Design, modelling, structure learning and Integration.对生物信息学统计贡献的回应：设计、建模、结构学习与整合

Stat Modelling. 2017 Aug;17(4-5):338-357. doi: 10.1177/1471082X17728576. Epub 2017 Sep 12.

Epigenetic machine learning: utilizing DNA methylation patterns to predict spastic cerebral palsy.表观遗传学机器学习：利用 DNA 甲基化模式预测痉挛性脑瘫。

BMC Bioinformatics. 2018 Jun 21;19(1):225. doi: 10.1186/s12859-018-2224-0.

Models for analyzing zero-inflated and overdispersed count data: an application to cigarette and marijuana use.用于分析零膨胀和过度分散计数数据的模型：在香烟和大麻使用中的应用。

Nicotine Tob Res. 2018 Apr 18;22(8):1390-8. doi: 10.1093/ntr/nty072.

Statistical Contributions to Bioinformatics: Design, Modeling, Structure Learning, and Integration.生物信息学中的统计学贡献：设计、建模、结构学习与整合

Stat Modelling. 2017;17(4-5):245-289. doi: 10.1177/1471082X17698255. Epub 2017 Jun 15.

Machine learning for epigenetics and future medical applications.用于表观遗传学和未来医学应用的机器学习。

Epigenetics. 2017 Jul 3;12(7):505-514. doi: 10.1080/15592294.2017.1329068. Epub 2017 May 19.

Modeling Variables With a Spike at Zero: Examples and Practical Recommendations.对零值处有尖峰的变量进行建模：示例与实用建议。

Am J Epidemiol. 2017 Apr 15;185(8):650-660. doi: 10.1093/aje/kww122.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

检测甲基化数据中尖峰协变量的可疑交互作用。

Detection of suspicious interactions of spiking covariates in methylation data.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献