CLCLSA：基于对比学习和自注意力机制的交叉组学链接嵌入，用于整合不完整多组学数据的多组学集成

CLCLSA: Cross-omics Linked embedding with Contrastive Learning and Self Attention for multi-omics integration with incomplete multi-omics data.

作者信息

Zhao Chen, Liu Anqi, Zhang Xiao, Cao Xuewei, Ding Zhengming, Sha Qiuying, Shen Hui, Deng Hong-Wen, Zhou Weihua

机构信息

Department of Applied Computing, Michigan Technological University, 1400 Townsend Dr, Houghton, MI 49931, USA.

Division of Biomedical Informatics and Genomics, Tulane Center of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University, New Orleans, LA 70112, USA.

出版信息

ArXiv. 2023 Apr 12:arXiv:2304.05542v1.

PMID:37090237

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10120753/

Abstract

Integration of heterogeneous and high-dimensional multi-omics data is becoming increasingly important in understanding genetic data. Each omics technique only provides a limited view of the underlying biological process and integrating heterogeneous omics layers simultaneously would lead to a more comprehensive and detailed understanding of diseases and phenotypes. However, one obstacle faced when performing multi-omics data integration is the existence of unpaired multi-omics data due to instrument sensitivity and cost. Studies may fail if certain aspects of the subjects are missing or incomplete. In this paper, we propose a deep learning method for multi-omics integration with incomplete data by Cross-omics Linked unified embedding with Contrastive Learning and Self Attention (CLCLSA). Utilizing complete multi-omics data as supervision, the model employs cross-omics autoencoders to learn the feature representation across different types of biological data. The multi-omics contrastive learning, which is used to maximize the mutual information between different types of omics, is employed before latent feature concatenation. In addition, the feature-level self-attention and omics-level self-attention are employed to dynamically identify the most informative features for multi-omics data integration. Extensive experiments were conducted on four public multi-omics datasets. The experimental results indicated that the proposed CLCLSA outperformed the state-of-the-art approaches for multi-omics data classification using incomplete multi-omics data.

摘要

整合异构和高维多组学数据在理解遗传数据方面正变得越来越重要。每种组学技术仅提供了潜在生物过程的有限视角，而同时整合异构的组学层将带来对疾病和表型更全面、更详细的理解。然而，在进行多组学数据整合时面临的一个障碍是由于仪器灵敏度和成本导致存在未配对的多组学数据。如果受试者的某些方面缺失或不完整，研究可能会失败。在本文中，我们提出了一种通过具有对比学习和自注意力的跨组学链接统一嵌入（CLCLSA）来处理不完整数据的多组学整合深度学习方法。该模型以完整的多组学数据作为监督，采用跨组学自动编码器来学习不同类型生物数据的特征表示。在潜在特征拼接之前，使用多组学对比学习来最大化不同类型组学之间的互信息。此外，采用特征级自注意力和组学级自注意力来动态识别用于多组学数据整合的最具信息性的特征。在四个公共多组学数据集上进行了广泛的实验。实验结果表明，所提出的CLCLSA在使用不完整多组学数据进行多组学数据分类方面优于现有方法。

相似文献

CLCLSA: Cross-omics Linked embedding with Contrastive Learning and Self Attention for multi-omics integration with incomplete multi-omics data.CLCLSA：基于对比学习和自注意力机制的交叉组学链接嵌入，用于整合不完整多组学数据的多组学集成

ArXiv. 2023 Apr 12:arXiv:2304.05542v1.

CLCLSA: Cross-omics Linked embedding with Contrastive Learning and Self Attention for multi-omics integration with incomplete multi-omics data.CLCLSA：基于对比学习和自注意力机制的跨组学链接嵌入，用于整合不完整多组学数据的多组学整合。

Res Sq. 2023 May 2:rs.3.rs-2768563. doi: 10.21203/rs.3.rs-2768563/v1.

CLCLSA: Cross-omics linked embedding with contrastive learning and self attention for integration with incomplete multi-omics data.CLCLSA：基于对比学习和自注意力机制的跨组学关联嵌入方法，用于整合不完整的多组学数据。

Comput Biol Med. 2024 Mar;170:108058. doi: 10.1016/j.compbiomed.2024.108058. Epub 2024 Jan 28.

DEDUCE: Multi-head attention decoupled contrastive learning to discover cancer subtypes based on multi-omics data.推断：基于多组学数据的多头注意力解耦对比学习发现癌症亚型。

Comput Methods Programs Biomed. 2024 Dec;257:108478. doi: 10.1016/j.cmpb.2024.108478. Epub 2024 Oct 30.

AVBAE-MODFR: A novel deep learning framework of embedding and feature selection on multi-omics data for pan-cancer classification.AVBAE-MODFR：一种基于多组学数据的嵌入和特征选择的深度学习框架，用于泛癌分类。

Comput Biol Med. 2024 Jul;177:108614. doi: 10.1016/j.compbiomed.2024.108614. Epub 2024 May 14.

Integration of multi-omics data using adaptive graph learning and attention mechanism for patient classification and biomarker identification.利用自适应图学习和注意力机制整合多组学数据，用于患者分类和生物标志物识别。

Comput Biol Med. 2023 Sep;164:107303. doi: 10.1016/j.compbiomed.2023.107303. Epub 2023 Aug 2.

Multi-omics integration method based on attention deep learning network for biomedical data classification.基于注意力深度学习网络的多组学整合方法用于生物医学数据分类

Comput Methods Programs Biomed. 2023 Apr;231:107377. doi: 10.1016/j.cmpb.2023.107377. Epub 2023 Jan 27.

DMOIT: denoised multi-omics integration approach based on transformer multi-head self-attention mechanism.DMOIT：基于Transformer多头自注意力机制的去噪多组学整合方法。

Front Genet. 2024 Dec 10;15:1488683. doi: 10.3389/fgene.2024.1488683. eCollection 2024.

SADLN: Self-attention based deep learning network of integrating multi-omics data for cancer subtype recognition.SADLN：基于自注意力机制的整合多组学数据用于癌症亚型识别的深度学习网络。

Front Genet. 2023 Jan 4;13:1032768. doi: 10.3389/fgene.2022.1032768. eCollection 2022.

IMOVNN: incomplete multi-omics data integration variational neural networks for gut microbiome disease prediction and biomarker identification.IMOVNN：用于肠道微生物组疾病预测和生物标志物识别的不完全多组学数据整合变分神经网络。

Brief Bioinform. 2023 Sep 22;24(6). doi: 10.1093/bib/bbad394.

本文引用的文献

Multi-view information fusion using multi-view variational autoencoder to predict proximal femoral fracture load.基于多视图变分自编码器的多视图信息融合预测股骨近端骨折载荷。

Front Endocrinol (Lausanne). 2023 Nov 21;14:1261088. doi: 10.3389/fendo.2023.1261088. eCollection 2023.

Missing data in multi-omics integration: Recent advances through artificial intelligence.多组学整合中的缺失数据：通过人工智能取得的最新进展

Front Artif Intell. 2023 Feb 9;6:1098308. doi: 10.3389/frai.2023.1098308. eCollection 2023.

Dual Contrastive Prediction for Incomplete Multi-View Representation Learning.用于不完整多视图表示学习的双对比预测

IEEE Trans Pattern Anal Mach Intell. 2023 Apr;45(4):4447-4461. doi: 10.1109/TPAMI.2022.3197238. Epub 2023 Mar 7.

Trusted Multi-View Classification With Dynamic Evidential Fusion.基于动态证据融合的可信多视图分类

IEEE Trans Pattern Anal Mach Intell. 2023 Feb;45(2):2551-2566. doi: 10.1109/TPAMI.2022.3171983. Epub 2023 Jan 6.

Multi-omics single-cell data integration and regulatory inference with graph-linked embedding.基于图链接嵌入的多组学单细胞数据整合与调控推断。

Nat Biotechnol. 2022 Oct;40(10):1458-1466. doi: 10.1038/s41587-022-01284-4. Epub 2022 May 2.

TiMEG: an integrative statistical method for partially missing multi-omics data.TiMEG：一种用于部分缺失多组学数据的综合统计方法。

Sci Rep. 2021 Dec 15;11(1):24077. doi: 10.1038/s41598-021-03034-z.

Multi-Omic Graph Transformers for Cancer Classification and Interpretation.多组学图变换模型在癌症分类和阐释中的应用

Pac Symp Biocomput. 2022;27:373-384.

All Models are Wrong, but are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously.所有模型都是有缺陷的，但都是有用的：通过同时研究一整个类别的预测模型来了解变量的重要性。

J Mach Learn Res. 2019;20.

MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification.MOGONET 通过使用图卷积网络整合多组学数据，从而实现患者分类和生物标志物识别。

Nat Commun. 2021 Jun 8;12(1):3445. doi: 10.1038/s41467-021-23774-w.

Iterative single-cell multi-omic integration using online learning.基于在线学习的迭代单细胞多组学整合。

Nat Biotechnol. 2021 Aug;39(8):1000-1007. doi: 10.1038/s41587-021-00867-x. Epub 2021 Apr 19.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

CLCLSA：基于对比学习和自注意力机制的交叉组学链接嵌入，用于整合不完整多组学数据的多组学集成

CLCLSA: Cross-omics Linked embedding with Contrastive Learning and Self Attention for multi-omics integration with incomplete multi-omics data.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献