MRC Biostatistics Unit, Cambridge University, Cambridge, United Kingdom.
PLoS One. 2019 Jul 23;14(7):e0213221. doi: 10.1371/journal.pone.0213221. eCollection 2019.
The copy numbers of genes in cancer samples are often highly disrupted and form a natural amplification/deletion experiment encompassing multiple genes. Matched array comparative genomics and transcriptomics datasets from such samples can be used to predict inter-chromosomal gene regulatory relationships. Previously we published the database METAMATCHED, comprising the results from such an analysis of a large number of publically available cancer datasets. Here we investigate genes in the database which are unusual in that their copy number exhibits consistent heterogeneous disruption in a high proportion of the cancer datasets. We assess the potential relevance of these genes to the pathology of the cancer samples, in light of their predicted regulatory relationships and enriched biological pathways. A network-based method was used to identify enriched pathways from the genes' inferred targets. The analysis predicts both known and new regulator-target interactions and pathway memberships. We examine examples in detail, in particular the gene POGZ, which is disrupted in many of the cancer datasets and has an unusually large number of predicted targets, from which the network analysis predicts membership of cancer related pathways. The results suggest close involvement in known cancer pathways of genes exhibiting consistent heterogeneous copy number disruption. Further experimental work would clarify their relevance to tumor biology. The results of the analysis presented in the database METAMATCHED, and included here as an R archive file, constitute a large number of predicted regulatory relationships and pathway memberships which we anticipate will be useful in informing such experiments.
癌症样本中的基因拷贝数通常会受到严重干扰,并形成一个自然的扩增/缺失实验,涵盖多个基因。可以使用来自这些样本的匹配阵列比较基因组学和转录组学数据集来预测染色体间的基因调控关系。此前,我们发布了数据库 METAMATCHED,其中包含对大量公开可用的癌症数据集进行此类分析的结果。在这里,我们研究了数据库中一些基因,它们的拷贝数在很大比例的癌症数据集中表现出一致的异质性破坏,这很不寻常。鉴于它们预测的调控关系和丰富的生物学途径,我们评估了这些基因与癌症样本病理学的潜在相关性。基于网络的方法用于从基因的推断靶标中识别富集途径。分析预测了已知和新的调节剂-靶标相互作用和途径成员关系。我们详细检查了一些示例,特别是基因 POGZ,它在许多癌症数据集中受到干扰,并且具有异常多的预测靶标,网络分析预测了这些靶标与癌症相关途径的成员关系。结果表明,在具有一致异质性拷贝数破坏的基因中,它们密切参与了已知的癌症途径。进一步的实验工作将阐明它们与肿瘤生物学的相关性。数据库 METAMATCHED 中呈现的分析结果,以及此处包含的 R 存档文件,构成了大量预测的调控关系和途径成员关系,我们预计这些关系将有助于指导此类实验。