Suppr超能文献

当应用于全基因组甲基化数据时,基因集分析会受到严重的偏差影响。

Gene-set analysis is severely biased when applied to genome-wide methylation data.

机构信息

Section of Hematology/Oncology, Department of Medicine, University of Chicago, Chicago, IL 60637 USA.

出版信息

Bioinformatics. 2013 Aug 1;29(15):1851-7. doi: 10.1093/bioinformatics/btt311. Epub 2013 Jun 3.

Abstract

MOTIVATION

DNA methylation is an epigenetic mark that can stably repress gene expression. Because of its biological and clinical significance, several methods have been developed to compare genome-wide patterns of methylation between groups of samples. The application of gene set analysis to identify relevant groups of genes that are enriched for differentially methylated genes is often a major component of the analysis of these data. This can be used, for example, to identify processes or pathways that are perturbed in disease development. We show that gene-set analysis, as it is typically applied to genome-wide methylation assays, is severely biased as a result of differences in the numbers of CpG sites associated with different classes of genes and gene promoters.

RESULTS

We demonstrate this bias using published data from a study of differential CpG island methylation in lung cancer and a dataset we generated to study methylation changes in patients with long-standing ulcerative colitis. We show that several of the gene sets that seem enriched would also be identified with randomized data. We suggest two existing approaches that can be adapted to correct the bias. Accounting for the bias in the lung cancer and ulcerative colitis datasets provides novel biological insights into the role of methylation in cancer development and chronic inflammation, respectively. Our results have significant implications for many previous genome-wide methylation studies that have drawn conclusions on the basis of such strongly biased analysis.

CONTACT

cathal.seoighe@nuigalway.ie

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

DNA 甲基化是一种表观遗传标记,可以稳定地抑制基因表达。由于其具有生物学和临床意义,因此已经开发出几种方法来比较样本组之间的全基因组甲基化模式。应用基因集分析来识别与差异甲基化基因富集相关的基因群,通常是分析这些数据的主要组成部分。例如,这可用于识别在疾病发展过程中受到干扰的过程或途径。我们表明,由于与不同类别的基因和基因启动子相关的 CpG 位点数量存在差异,通常应用于全基因组甲基化测定的基因集分析存在严重的偏差。

结果

我们使用来自肺癌中差异 CpG 岛甲基化研究的已发表数据和我们生成的用于研究长期溃疡性结肠炎患者甲基化变化的数据集来证明这种偏差。我们表明,一些似乎富集的基因集也可以用随机数据识别。我们建议可以采用两种现有的方法来纠正这种偏差。在肺癌和溃疡性结肠炎数据集上考虑这种偏差,可以分别为癌症发展和慢性炎症中甲基化的作用提供新的生物学见解。我们的结果对许多以前基于这种强烈偏差分析得出结论的全基因组甲基化研究具有重要意义。

联系方式

cathal.seoighe@nuigalway.ie

补充信息

补充数据可在“Bioinformatics”在线获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验