Suppr超能文献

随机基因集表达与生存之间的关联在多种癌症类型中都很明显,并且可能可以通过亚分类来解释。

Association between expression of random gene sets and survival is evident in multiple cancer types and may be explained by sub-classification.

机构信息

IBM Research-Haifa, Haifa, Israel.

出版信息

PLoS Comput Biol. 2018 Feb 22;14(2):e1006026. doi: 10.1371/journal.pcbi.1006026. eCollection 2018 Feb.

Abstract

One of the goals of cancer research is to identify a set of genes that cause or control disease progression. However, although multiple such gene sets were published, these are usually in very poor agreement with each other, and very few of the genes proved to be functional therapeutic targets. Furthermore, recent findings from a breast cancer gene-expression cohort showed that sets of genes selected randomly can be used to predict survival with a much higher probability than expected. These results imply that many of the genes identified in breast cancer gene expression analysis may not be causal of cancer progression, even though they can still be highly predictive of prognosis. We performed a similar analysis on all the cancer types available in the cancer genome atlas (TCGA), namely, estimating the predictive power of random gene sets for survival. Our work shows that most cancer types exhibit the property that random selections of genes are more predictive of survival than expected. In contrast to previous work, this property is not removed by using a proliferation signature, which implies that proliferation may not always be the confounder that drives this property. We suggest one possible solution in the form of data-driven sub-classification to reduce this property significantly. Our results suggest that the predictive power of random gene sets may be used to identify the existence of sub-classes in the data, and thus may allow better understanding of patient stratification. Furthermore, by reducing the observed bias this may allow more direct identification of biologically relevant, and potentially causal, genes.

摘要

癌症研究的目标之一是确定一组导致或控制疾病进展的基因。然而,尽管已经发表了多个这样的基因集,但它们通常彼此之间存在很大差异,并且很少有基因被证明是有效的治疗靶点。此外,最近来自乳腺癌基因表达队列的研究结果表明,随机选择的基因集可以用于预测生存的概率比预期的要高得多。这些结果表明,在乳腺癌基因表达分析中鉴定的许多基因可能不是癌症进展的原因,即使它们仍然可以高度预测预后。我们在癌症基因组图谱 (TCGA) 中所有可用的癌症类型上进行了类似的分析,即估计随机基因集对生存的预测能力。我们的工作表明,大多数癌症类型都具有这样的特性,即随机选择的基因比预期更能预测生存。与以前的工作不同,这种特性不会因使用增殖特征而消除,这意味着增殖可能并不总是导致这种特性的混杂因素。我们以数据驱动的亚分类的形式提出了一个可能的解决方案,以大大降低这种特性。我们的结果表明,随机基因集的预测能力可用于识别数据中存在的亚类,从而可以更好地理解患者分层。此外,通过减少观察到的偏差,这可能允许更直接地识别生物学上相关的、潜在的因果基因。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74f5/5839591/533e8b766d7a/pcbi.1006026.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验