Suppr超能文献

对TCGA乳腺癌和肺癌转录组数据的主题建模分析。

A Topic Modeling Analysis of TCGA Breast and Lung Cancer Transcriptomic Data.

作者信息

Valle Filippo, Osella Matteo, Caselle Michele

机构信息

Physics Department, University of Turin and INFN, via P. Giuria 1, 10125 Turin, Italy.

出版信息

Cancers (Basel). 2020 Dec 16;12(12):3799. doi: 10.3390/cancers12123799.

Abstract

Topic modeling is a widely used technique to extract relevant information from large arrays of data. The problem of finding a topic structure in a dataset was recently recognized to be analogous to the community detection problem in network theory. Leveraging on this analogy, a new class of topic modeling strategies has been introduced to overcome some of the limitations of classical methods. This paper applies these recent ideas to TCGA transcriptomic data on breast and lung cancer. The established cancer subtype organization is well reconstructed in the inferred latent topic structure. Moreover, we identify specific topics that are enriched in genes known to play a role in the corresponding disease and are strongly related to the survival probability of patients. Finally, we show that a simple neural network classifier operating in the low dimensional topic space is able to predict with high accuracy the cancer subtype of a test expression sample.

摘要

主题建模是一种广泛使用的技术,用于从大量数据中提取相关信息。最近人们认识到,在数据集中寻找主题结构的问题类似于网络理论中的社区检测问题。基于这种类比,引入了一类新的主题建模策略,以克服经典方法的一些局限性。本文将这些最新思想应用于TCGA乳腺癌和肺癌转录组数据。在推断出的潜在主题结构中,已建立的癌症亚型组织得到了很好的重建。此外,我们识别出了特定的主题,这些主题在已知与相应疾病相关的基因中富集,并且与患者的生存概率密切相关。最后,我们表明,在低维主题空间中运行的简单神经网络分类器能够高精度地预测测试表达样本的癌症亚型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/57cd/7766023/4e5cccdd9f8f/cancers-12-03799-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验