Suppr超能文献

基于网络的正则化在高维 DNA 甲基化数据匹配病例对照分析中的应用。

Network-based regularization for matched case-control analysis of high-dimensional DNA methylation data.

机构信息

Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY 10032, USA.

出版信息

Stat Med. 2013 May 30;32(12):2127-39. doi: 10.1002/sim.5694. Epub 2012 Dec 5.

Abstract

The matched case-control designs are commonly used to control for potential confounding factors in genetic epidemiology studies especially epigenetic studies with DNA methylation. Compared with unmatched case-control studies with high-dimensional genomic or epigenetic data, there have been few variable selection methods for matched sets. In an earlier paper, we proposed the penalized logistic regression model for the analysis of unmatched DNA methylation data using a network-based penalty. However, for popularly applied matched designs in epigenetic studies that compare DNA methylation between tumor and adjacent non-tumor tissues or between pre-treatment and post-treatment conditions, applying ordinary logistic regression ignoring matching is known to bring serious bias in estimation. In this paper, we developed a penalized conditional logistic model using the network-based penalty that encourages a grouping effect of (1) linked Cytosine-phosphate-Guanine (CpG) sites within a gene or (2) linked genes within a genetic pathway for analysis of matched DNA methylation data. In our simulation studies, we demonstrated the superiority of using conditional logistic model over unconditional logistic model in high-dimensional variable selection problems for matched case-control data. We further investigated the benefits of utilizing biological group or graph information for matched case-control data. We applied the proposed method to a genome-wide DNA methylation study on hepatocellular carcinoma (HCC) where we investigated the DNA methylation levels of tumor and adjacent non-tumor tissues from HCC patients by using the Illumina Infinium HumanMethylation27 Beadchip. Several new CpG sites and genes known to be related to HCC were identified but were missed by the standard method in the original paper.

摘要

匹配病例对照设计通常用于控制遗传流行病学研究,特别是 DNA 甲基化的表观遗传学研究中的潜在混杂因素。与具有高维基因组或表观遗传数据的不匹配病例对照研究相比,针对匹配数据集的变量选择方法较少。在早期的一篇论文中,我们提出了一种基于网络惩罚的惩罚逻辑回归模型,用于分析不匹配的 DNA 甲基化数据。然而,对于表观遗传学研究中常用的匹配设计,即比较肿瘤和相邻非肿瘤组织之间或预处理和后处理条件之间的 DNA 甲基化,忽略匹配的普通逻辑回归已知会导致估计严重偏倚。在本文中,我们开发了一种基于网络惩罚的惩罚条件逻辑回归模型,该模型鼓励(1)基因内连接的胞嘧啶-磷酸-鸟嘌呤(CpG)位点或(2)遗传途径内连接的基因的分组效应,用于分析匹配的 DNA 甲基化数据。在我们的模拟研究中,我们证明了在高维变量选择问题中,使用条件逻辑回归模型优于无条件逻辑回归模型。我们进一步研究了利用生物学组或图形信息对匹配病例对照数据的益处。我们将所提出的方法应用于肝细胞癌(HCC)的全基因组 DNA 甲基化研究,其中我们使用 Illumina Infinium HumanMethylation27 Beadchip 研究了 HCC 患者肿瘤和相邻非肿瘤组织的 DNA 甲基化水平。鉴定了几个新的 CpG 位点和已知与 HCC 相关的基因,但在原始论文的标准方法中被遗漏了。

相似文献

引用本文的文献

9
A Review of Matched-pairs Feature Selection Methods for Gene Expression Data Analysis.基因表达数据分析中配对特征选择方法综述
Comput Struct Biotechnol J. 2018 Feb 25;16:88-97. doi: 10.1016/j.csbj.2018.02.005. eCollection 2018.

本文引用的文献

4
Genome-wide DNA methylation profiles in hepatocellular carcinoma.肝细胞癌的全基因组 DNA 甲基化图谱。
Hepatology. 2012 Jun;55(6):1799-808. doi: 10.1002/hep.25569. Epub 2012 Apr 24.
6
Pseudosibship methods in the case-parents design.病例-父母设计中的拟亲缘关系方法。
Stat Med. 2011 Nov 30;30(27):3236-51. doi: 10.1002/sim.4397. Epub 2011 Sep 23.
7
Significance analysis and statistical dissection of variably methylated regions.可变甲基化区域的意义分析和统计剖析。
Biostatistics. 2012 Jan;13(1):166-78. doi: 10.1093/biostatistics/kxr013. Epub 2011 Jun 17.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验