交叉注意力组学:基于交叉注意力的多组学数据整合

CrossAttOmics: multiomics data integration with cross-attention.

作者信息

Beaude Aurélien, Augé Franck, Zehraoui Farida, Hanczar Blaise

机构信息

Université Paris-Saclay, Univ Evry, IBISC, Evry-Courcouronnes 91020, France.

Sanofi R&D, Translational Precision Medicine, Vitry-sur-Seine 94400, France.

出版信息

Bioinformatics. 2025 Jun 2;41(6). doi: 10.1093/bioinformatics/btaf302.

Abstract

MOTIVATION

Advances in high throughput technologies enabled large access to various types of omics. Each omics provides a partial view of the underlying biological process. Integrating multiple omics layers would help have a more accurate diagnosis. However, the complexity of omics data requires approaches that can capture complex relationships. One way to accomplish this is by exploiting the known regulatory links between the different omics, which could help in constructing a better multimodal representation.

RESULTS

In this article, we propose CrossAttOmics, a new deep-learning architecture based on the cross-attention mechanism for multiomics integration. Each modality is projected in a lower dimensional space with its specific encoder. Interactions between modalities with known regulatory links are computed in the feature representation space with cross-attention. The results of different experiments carried out in this article show that our model can accurately predict the types of cancer by exploiting the interactions between multiple modalities. CrossAttOmics outperforms other methods when there are few paired training examples. Our approach can be combined with attribution methods like LRP to identify which interactions are the most important.

AVAILABILITY AND IMPLEMENTATION

The code is available at https://github.com/Sanofi-Public/CrossAttOmics and https://doi.org/10.5281/zenodo.15065928. TCGA data can be downloaded from the Genomic Data Commons Data Portal. CCLE data can be downloaded from the depmap portal.

摘要

动机

高通量技术的进步使得能够大量获取各种类型的组学数据。每个组学都提供了潜在生物学过程的部分视图。整合多个组学层面将有助于进行更准确的诊断。然而,组学数据的复杂性需要能够捕捉复杂关系的方法。实现这一点的一种方法是利用不同组学之间已知的调控联系,这有助于构建更好的多模态表示。

结果

在本文中,我们提出了CrossAttOmics,这是一种基于交叉注意力机制的用于多组学整合的新型深度学习架构。每个模态通过其特定的编码器投影到低维空间中。具有已知调控联系的模态之间的相互作用在特征表示空间中通过交叉注意力进行计算。本文进行的不同实验结果表明,我们的模型可以通过利用多个模态之间的相互作用准确预测癌症类型。当配对训练示例较少时,CrossAttOmics优于其他方法。我们的方法可以与诸如LRP等归因方法相结合,以确定哪些相互作用最为重要。

可用性与实现

代码可在https://github.com/Sanofi-Public/CrossAttOmics和https://doi.org/10.5281/zenodo.15065928获取。TCGA数据可从基因组数据共享数据门户下载。CCLE数据可从depmap门户下载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/90fa/12141196/4d4dc6180486/btaf302f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索