CLCLSA：基于对比学习和自注意力机制的跨组学关联嵌入方法，用于整合不完整的多组学数据。

CLCLSA: Cross-omics linked embedding with contrastive learning and self attention for integration with incomplete multi-omics data.

机构信息

Department of Computer Science, Kennesaw State University, Marietta, GA, 30060, USA.

Division of Biomedical Informatics and Genomics, Tulane Center of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University, New Orleans, LA, 70112, USA.

出版信息

Comput Biol Med. 2024 Mar;170:108058. doi: 10.1016/j.compbiomed.2024.108058. Epub 2024 Jan 28.

DOI:10.1016/j.compbiomed.2024.108058

PMID:38295477

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10959569/

Abstract

Integration of heterogeneous and high-dimensional multi-omics data is becoming increasingly important in understanding etiology of complex genetic diseases. Each omics technique only provides a limited view of the underlying biological process and integrating heterogeneous omics layers simultaneously would lead to a more comprehensive and detailed understanding of diseases and phenotypes. However, one obstacle faced when performing multi-omics data integration is the existence of unpaired multi-omics data due to instrument sensitivity and cost. Studies may fail if certain aspects of the subjects are missing or incomplete. In this paper, we propose a deep learning method for multi-omics integration with incomplete data by Cross-omics Linked unified embedding with Contrastive Learning and Self Attention (CLCLSA). Utilizing complete multi-omics data as supervision, the model employs cross-omics autoencoders to learn the feature representation across different types of biological data. The multi-omics contrastive learning is employed, which maximizes the mutual information between different types of omics. In addition, the feature-level self-attention and omics-level self-attention are employed to dynamically identify the most informative features for multi-omics data integration. Finally, a Softmax classifier is employed to perform multi-omics data classification. Extensive experiments were conducted on four public multi-omics datasets. The experimental results indicate that our proposed CLCLSA produces promising results in multi-omics data classification using both complete and incomplete multi-omics data.

摘要

整合异质和高维多组学数据对于理解复杂遗传疾病的病因变得越来越重要。每种组学技术仅提供对潜在生物学过程的有限观察，同时整合异质组学层将导致对疾病和表型的更全面和详细的理解。然而，在执行多组学数据整合时面临的一个障碍是由于仪器灵敏度和成本而存在未配对的多组学数据。如果研究对象的某些方面缺失或不完整，研究可能会失败。在本文中，我们提出了一种通过 Cross-omics Linked unified embedding with Contrastive Learning and Self Attention (CLCLSA) 进行具有不完整数据的多组学整合的深度学习方法。该模型利用完整的多组学数据作为监督，使用跨组学自动编码器来学习不同类型生物数据之间的特征表示。采用多组学对比学习，最大化不同类型组学之间的互信息。此外，使用特征级自注意力和组学级自注意力来动态识别最有助于多组学数据整合的信息特征。最后，使用 Softmax 分类器进行多组学数据分类。在四个公共多组学数据集上进行了广泛的实验。实验结果表明，我们提出的 CLCLSA 在使用完整和不完整的多组学数据进行多组学数据分类时产生了有希望的结果。

相似文献

CLCLSA: Cross-omics linked embedding with contrastive learning and self attention for integration with incomplete multi-omics data.CLCLSA：基于对比学习和自注意力机制的跨组学关联嵌入方法，用于整合不完整的多组学数据。

Comput Biol Med. 2024 Mar;170:108058. doi: 10.1016/j.compbiomed.2024.108058. Epub 2024 Jan 28.

CLCLSA: Cross-omics Linked embedding with Contrastive Learning and Self Attention for multi-omics integration with incomplete multi-omics data.CLCLSA：基于对比学习和自注意力机制的跨组学链接嵌入，用于整合不完整多组学数据的多组学整合。

Res Sq. 2023 May 2:rs.3.rs-2768563. doi: 10.21203/rs.3.rs-2768563/v1.

CLCLSA: Cross-omics Linked embedding with Contrastive Learning and Self Attention for multi-omics integration with incomplete multi-omics data.CLCLSA：基于对比学习和自注意力机制的交叉组学链接嵌入，用于整合不完整多组学数据的多组学集成

ArXiv. 2023 Apr 12:arXiv:2304.05542v1.

DEDUCE: Multi-head attention decoupled contrastive learning to discover cancer subtypes based on multi-omics data.推断：基于多组学数据的多头注意力解耦对比学习发现癌症亚型。

Comput Methods Programs Biomed. 2024 Dec;257:108478. doi: 10.1016/j.cmpb.2024.108478. Epub 2024 Oct 30.

Multi-omics integration method based on attention deep learning network for biomedical data classification.基于注意力深度学习网络的多组学整合方法用于生物医学数据分类

Comput Methods Programs Biomed. 2023 Apr;231:107377. doi: 10.1016/j.cmpb.2023.107377. Epub 2023 Jan 27.

Integration of multi-omics data using adaptive graph learning and attention mechanism for patient classification and biomarker identification.利用自适应图学习和注意力机制整合多组学数据，用于患者分类和生物标志物识别。

Comput Biol Med. 2023 Sep;164:107303. doi: 10.1016/j.compbiomed.2023.107303. Epub 2023 Aug 2.

AVBAE-MODFR: A novel deep learning framework of embedding and feature selection on multi-omics data for pan-cancer classification.AVBAE-MODFR：一种基于多组学数据的嵌入和特征选择的深度学习框架，用于泛癌分类。

Comput Biol Med. 2024 Jul;177:108614. doi: 10.1016/j.compbiomed.2024.108614. Epub 2024 May 14.

MoAGL-SA: a multi-omics adaptive integration method with graph learning and self attention for cancer subtype classification.MoAGL-SA：一种基于图学习和自注意力的多组学自适应整合方法，用于癌症亚型分类。

BMC Bioinformatics. 2024 Nov 23;25(1):364. doi: 10.1186/s12859-024-05989-y.

PCLSurv: a prototypical contrastive learning-based multi-omics data integration model for cancer survival prediction.PCLSurv：一种基于对比学习的用于癌症生存预测的多组学数据集成原型模型。

Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf124.

MODILM: towards better complex diseases classification using a novel multi-omics data integration learning model.MODILM：一种利用新型多组学数据整合学习模型改善复杂疾病分类的方法。

BMC Med Inform Decis Mak. 2023 May 5;23(1):82. doi: 10.1186/s12911-023-02173-9.

引用本文的文献

A foundation model for learning genetic associations from brain imaging phenotypes.一种用于从脑成像表型中学习基因关联的基础模型。

Bioinform Adv. 2025 Aug 13;5(1):vbaf196. doi: 10.1093/bioadv/vbaf196. eCollection 2025.

A technical review of multi-omics data integration methods: from classical statistical to deep generative approaches.多组学数据整合方法的技术综述：从经典统计方法到深度生成方法

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf355.

Cancer molecular subtyping using limited multi-omics data with missingness.利用带有缺失值的有限多组学数据进行癌症分子亚型分类。

PLoS Comput Biol. 2024 Dec 26;20(12):e1012710. doi: 10.1371/journal.pcbi.1012710. eCollection 2024 Dec.

SGUQ: Staged Graph Convolution Neural Network for Alzheimer's Disease Diagnosis using Multi-Omics Data.SGUQ：用于利用多组学数据进行阿尔茨海默病诊断的分段图卷积神经网络

ArXiv. 2024 Oct 14:arXiv:2410.11046v1.

本文引用的文献

Multi-view information fusion using multi-view variational autoencoder to predict proximal femoral fracture load.基于多视图变分自编码器的多视图信息融合预测股骨近端骨折载荷。

Front Endocrinol (Lausanne). 2023 Nov 21;14:1261088. doi: 10.3389/fendo.2023.1261088. eCollection 2023.

Missing data in multi-omics integration: Recent advances through artificial intelligence.多组学整合中的缺失数据：通过人工智能取得的最新进展

Front Artif Intell. 2023 Feb 9;6:1098308. doi: 10.3389/frai.2023.1098308. eCollection 2023.

Dual Contrastive Prediction for Incomplete Multi-View Representation Learning.用于不完整多视图表示学习的双对比预测

IEEE Trans Pattern Anal Mach Intell. 2023 Apr;45(4):4447-4461. doi: 10.1109/TPAMI.2022.3197238. Epub 2023 Mar 7.

Trusted Multi-View Classification With Dynamic Evidential Fusion.基于动态证据融合的可信多视图分类

IEEE Trans Pattern Anal Mach Intell. 2023 Feb;45(2):2551-2566. doi: 10.1109/TPAMI.2022.3171983. Epub 2023 Jan 6.

Multi-omics single-cell data integration and regulatory inference with graph-linked embedding.基于图链接嵌入的多组学单细胞数据整合与调控推断。

Nat Biotechnol. 2022 Oct;40(10):1458-1466. doi: 10.1038/s41587-022-01284-4. Epub 2022 May 2.

TiMEG: an integrative statistical method for partially missing multi-omics data.TiMEG：一种用于部分缺失多组学数据的综合统计方法。

Sci Rep. 2021 Dec 15;11(1):24077. doi: 10.1038/s41598-021-03034-z.

Multi-Omic Graph Transformers for Cancer Classification and Interpretation.多组学图变换模型在癌症分类和阐释中的应用

Pac Symp Biocomput. 2022;27:373-384.

Multi-omics integration in the age of million single-cell data.多组学整合在百万单细胞数据时代。

Nat Rev Nephrol. 2021 Nov;17(11):710-724. doi: 10.1038/s41581-021-00463-x. Epub 2021 Aug 20.

All Models are Wrong, but are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously.所有模型都是有缺陷的，但都是有用的：通过同时研究一整个类别的预测模型来了解变量的重要性。

J Mach Learn Res. 2019;20.

MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification.MOGONET 通过使用图卷积网络整合多组学数据，从而实现患者分类和生物标志物识别。

Nat Commun. 2021 Jun 8;12(1):3445. doi: 10.1038/s41467-021-23774-w.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验