Zhang Qingyang, Burdette Joanna E, Wang Ji-Ping
Department of Statistics, Northwestern University, Evanston, IL60208, USA.
Department of Medicinal Chemistry and Pharmacognosy, University of Illinois, Chicago, 60607, IL, USA.
BMC Syst Biol. 2014 Dec 31;8:1338. doi: 10.1186/s12918-014-0136-9.
Over the past years, tremendous efforts have been made to elucidate the molecular basis of the initiation and progression of ovarian cancer. However, most existing studies have been focused on individual genes or a single type of data, which may lack the power to detect the complex mechanisms of cancer formation by overlooking the interactions of different genetic and epigenetic factors.
We propose an integrative framework to identify genetic and epigenetic features related to ovarian cancer and to quantify the causal relationships among these features using a probabilistic graphical model based on the Cancer Genome Atlas (TCGA) data. In the feature selection, we first defined a set of seed genes by including 48 candidate tumor suppressors or oncogenes and an additional 20 ovarian cancer related genes reported in the literature. The seed genes were then fed into a stepwise correlation-based selector to identify 271 additional features including 177 genes, 82 copy number variation sites, 11 methylation sites and 1 somatic mutation (at gene TP53). We built a Bayesian network model with a logit link function to quantify the causal relationships among these features and discovered a set of 13 hub genes including ARID1A, C19orf53, CSKN2A1 and COL5A2. The directed graph revealed many potential genetic pathways, some of which confirmed the existing results in the literature. Clustering analysis further suggested four gene clusters, three of which correspond to well-defined cellular processes including cell division, tumor invasion and mitochondrial system. In addition, two genes related to glycoprotein synthesis, PSG11 and GALNT10, were found highly predictive for the overall survival time of ovarian cancer patients.
The proposed framework is effective in identifying possible important genetic and epigenetic features that are related to complex cancer diseases. The constructed Bayesian network has identified some new genetic/epigenetic pathways, which may shed new light into the molecular mechanisms of ovarian cancer.
在过去几年中,人们为阐明卵巢癌发生和发展的分子基础付出了巨大努力。然而,大多数现有研究都集中在单个基因或单一类型的数据上,通过忽视不同遗传和表观遗传因素之间的相互作用,可能缺乏检测癌症形成复杂机制的能力。
我们提出了一个综合框架,以识别与卵巢癌相关的遗传和表观遗传特征,并使用基于癌症基因组图谱(TCGA)数据的概率图形模型来量化这些特征之间的因果关系。在特征选择过程中,我们首先通过纳入48个候选肿瘤抑制基因或癌基因以及文献中报道的另外20个卵巢癌相关基因来定义一组种子基因。然后将这些种子基因输入基于逐步相关性的选择器,以识别另外271个特征,包括177个基因、82个拷贝数变异位点、11个甲基化位点和1个体细胞突变(位于基因TP53)。我们构建了一个具有logit链接函数的贝叶斯网络模型,以量化这些特征之间的因果关系,并发现了一组13个枢纽基因,包括ARID1A、C19orf53、CSKN2A1和COL5A2。有向图揭示了许多潜在的遗传途径,其中一些证实了文献中的现有结果。聚类分析进一步表明存在四个基因簇,其中三个对应于明确的细胞过程,包括细胞分裂、肿瘤侵袭和线粒体系统。此外,发现两个与糖蛋白合成相关的基因PSG11和GALNT10对卵巢癌患者的总生存时间具有高度预测性。
所提出的框架在识别与复杂癌症疾病相关的可能重要的遗传和表观遗传特征方面是有效的。构建的贝叶斯网络已经识别出一些新的遗传/表观遗传途径,这可能为卵巢癌的分子机制提供新的线索。