Suppr超能文献

基于对比学习的深度学习神经网络,通过整合多组学数据进行癌症亚型分类。

A Contrastive-Learning-Based Deep Neural Network for Cancer Subtyping by Integrating Multi-Omics Data.

机构信息

School of Mathematics and Big Data, Foshan University, Foshan, 528000, China.

Peng Cheng Laboratory, Shenzhen, 518055, China.

出版信息

Interdiscip Sci. 2024 Dec;16(4):966-975. doi: 10.1007/s12539-024-00641-y. Epub 2024 Sep 4.

Abstract

BACKGROUND

Accurate identification of cancer subtypes is crucial for disease prognosis evaluation and personalized patient management. Recent advances in computational methods have demonstrated that multi-omics data provides valuable insights into tumor molecular subtyping. However, the high dimensionality and small sample size of the data may result in ambiguous and overlapping cancer subtypes during clustering. In this study, we propose a novel contrastive-learning-based approach to address this issue. The proposed end-to-end deep learning method can extract crucial information from the multi-omics features by self-supervised learning for patient clustering.

RESULTS

By applying our method to nine public cancer datasets, we have demonstrated superior performance compared to existing methods in separating patients with different survival outcomes (p < 0.05). To further evaluate the impact of various omics data on cancer survival, we developed an XGBoost classification model and found that mRNA had the highest importance score, followed by DNA methylation and miRNA. In the presented case study, our method successfully clustered subtypes and identified 14 cancer-related genes, of which 12 (85.7%) were validated through literature review.

CONCLUSIONS

Our findings demonstrate that our method is capable of identifying cancer subtypes that are both statistically and biologically significant. The code about COLCS is given at: https://github.com/Mercuriiio/COLCS .

摘要

背景

准确识别癌症亚型对于疾病预后评估和个性化患者管理至关重要。最近计算方法的进展表明,多组学数据为肿瘤分子亚型提供了有价值的见解。然而,数据的高维性和小样本量可能导致聚类过程中癌症亚型的模糊和重叠。在这项研究中,我们提出了一种新的基于对比学习的方法来解决这个问题。所提出的端到端深度学习方法可以通过自监督学习从多组学特征中提取关键信息,以进行患者聚类。

结果

通过将我们的方法应用于九个公共癌症数据集,我们证明了与现有方法相比,在分离具有不同生存结果的患者方面具有优异的性能(p<0.05)。为了进一步评估各种组学数据对癌症生存的影响,我们开发了一个 XGBoost 分类模型,发现 mRNA 的重要性评分最高,其次是 DNA 甲基化和 miRNA。在呈现的案例研究中,我们的方法成功地对亚型进行了聚类,并鉴定了 14 个与癌症相关的基因,其中 12 个(85.7%)通过文献综述得到了验证。

结论

我们的研究结果表明,我们的方法能够识别具有统计学和生物学意义的癌症亚型。有关 COLCS 的代码位于:https://github.com/Mercuriiio/COLCS。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验