推断：基于多组学数据的多头注意力解耦对比学习发现癌症亚型。

DEDUCE: Multi-head attention decoupled contrastive learning to discover cancer subtypes based on multi-omics data.

机构信息

College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410083, Hunan, China.

Department of Thoracic Surgery, The Second Xiangya Hospital, Central South University, Changsha, 410083, Hunan, China.

出版信息

Comput Methods Programs Biomed. 2024 Dec;257:108478. doi: 10.1016/j.cmpb.2024.108478. Epub 2024 Oct 30.

DOI:10.1016/j.cmpb.2024.108478

PMID:39504713

Abstract

BACKGROUND AND OBJECTIVE

Given the high heterogeneity and clinical diversity of cancer, substantial variations exist in multi-omics data and clinical features across different cancer subtypes.

METHODS

We propose a model, named DEDUCE, based on a symmetric multi-head attention encoders (SMAE), for unsupervised contrastive learning to analyze multi-omics cancer data, with the aim of identifying and characterizing cancer subtypes. This model adopts a unsupervised SMAE that can deeply extract contextual features and long-range dependencies from multi-omics data, thereby mitigating the impact of noise. Importantly, DEDUCE introduces a subtype decoupled contrastive learning method based on a multi-head attention mechanism to simultaneously learn features from multi-omics data and perform clustering for identifying cancer subtypes. Subtypes are clustered by calculating the similarity between samples in both the feature space and sample space of multi-omics data. The fundamental concept involves decoupling various attributes of multi-omics data features and learning them as contrasting terms. A contrastive loss function is constructed to quantify the disparity between positive and negative examples, and the model minimizes this difference, thereby promoting the acquisition of enhanced feature representation.

RESULTS

The DEDUCE model undergoes extensive experiments on simulated multi-omics datasets, single-cell multi-omics datasets, and cancer multi-omics datasets, outperforming 10 deep learning models. The DEDUCE model outperforms state-of-the-art methods, and ablation experiments demonstrate the effectiveness of each module in the DEDUCE model. Finally, we applied the DEDUCE model to identify six cancer subtypes of AML.

CONCLUSION

In this paper, we proposed DEDUCE model learns features from multi-omics data through SMAE, and the subtype decoupled contrastive learning consistently optimizes the model for clustering and identifying cancer subtypes. The DEDUCE model demonstrates a significant capability in discovering new cancer subtypes. We applied the DEDUCE model to identify six subtypes of AML. Through the analysis of GO function enrichment, subtype-specific biological functions, and GSEA of AML using the DEDUCE model, the interpretability of the DEDUCE model in identifying cancer subtypes is further enhanced.

摘要

背景与目的

由于癌症的高度异质性和临床多样性，不同癌症亚型的多组学数据和临床特征存在很大差异。

方法

我们提出了一种基于对称多头注意力编码器（SMAE）的模型 DEDUCE，用于无监督对比学习分析多组学癌症数据，旨在识别和表征癌症亚型。该模型采用无监督 SMAE，可以从多组学数据中深度提取上下文特征和长程依赖关系，从而减轻噪声的影响。重要的是，DEDUCE 引入了一种基于多头注意力机制的亚型解耦对比学习方法，用于同时从多组学数据中学习特征，并进行聚类以识别癌症亚型。通过计算多组学数据的特征空间和样本空间中样本之间的相似度来对亚型进行聚类。基本思想是解耦多组学数据特征的各种属性，并将它们学习为对比项。构建对比损失函数来量化正例和负例之间的差异，模型最小化这个差异，从而促进获取增强的特征表示。

结果

DEDUCE 模型在模拟多组学数据集、单细胞多组学数据集和癌症多组学数据集中进行了广泛的实验，优于 10 种深度学习模型。DEDUCE 模型优于最新方法，消融实验证明了 DEDUCE 模型中每个模块的有效性。最后，我们应用 DEDUCE 模型来识别 AML 的六种癌症亚型。

结论

本文提出的 DEDUCE 模型通过 SMAE 从多组学数据中学习特征，亚型解耦对比学习一致优化模型进行聚类和识别癌症亚型。DEDUCE 模型在发现新的癌症亚型方面表现出显著的能力。我们应用 DEDUCE 模型来识别 AML 的六种亚型。通过使用 DEDUCE 模型对 AML 进行 GO 功能富集分析、亚型特异性生物学功能分析和 GSEA 分析，进一步增强了 DEDUCE 模型在识别癌症亚型方面的可解释性。