Suppr超能文献

CLCLSA:基于对比学习和自注意力机制的跨组学关联嵌入方法,用于整合不完整的多组学数据。

CLCLSA: Cross-omics linked embedding with contrastive learning and self attention for integration with incomplete multi-omics data.

机构信息

Department of Computer Science, Kennesaw State University, Marietta, GA, 30060, USA.

Division of Biomedical Informatics and Genomics, Tulane Center of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University, New Orleans, LA, 70112, USA.

出版信息

Comput Biol Med. 2024 Mar;170:108058. doi: 10.1016/j.compbiomed.2024.108058. Epub 2024 Jan 28.

Abstract

Integration of heterogeneous and high-dimensional multi-omics data is becoming increasingly important in understanding etiology of complex genetic diseases. Each omics technique only provides a limited view of the underlying biological process and integrating heterogeneous omics layers simultaneously would lead to a more comprehensive and detailed understanding of diseases and phenotypes. However, one obstacle faced when performing multi-omics data integration is the existence of unpaired multi-omics data due to instrument sensitivity and cost. Studies may fail if certain aspects of the subjects are missing or incomplete. In this paper, we propose a deep learning method for multi-omics integration with incomplete data by Cross-omics Linked unified embedding with Contrastive Learning and Self Attention (CLCLSA). Utilizing complete multi-omics data as supervision, the model employs cross-omics autoencoders to learn the feature representation across different types of biological data. The multi-omics contrastive learning is employed, which maximizes the mutual information between different types of omics. In addition, the feature-level self-attention and omics-level self-attention are employed to dynamically identify the most informative features for multi-omics data integration. Finally, a Softmax classifier is employed to perform multi-omics data classification. Extensive experiments were conducted on four public multi-omics datasets. The experimental results indicate that our proposed CLCLSA produces promising results in multi-omics data classification using both complete and incomplete multi-omics data.

摘要

整合异质和高维多组学数据对于理解复杂遗传疾病的病因变得越来越重要。每种组学技术仅提供对潜在生物学过程的有限观察,同时整合异质组学层将导致对疾病和表型的更全面和详细的理解。然而,在执行多组学数据整合时面临的一个障碍是由于仪器灵敏度和成本而存在未配对的多组学数据。如果研究对象的某些方面缺失或不完整,研究可能会失败。在本文中,我们提出了一种通过 Cross-omics Linked unified embedding with Contrastive Learning and Self Attention (CLCLSA) 进行具有不完整数据的多组学整合的深度学习方法。该模型利用完整的多组学数据作为监督,使用跨组学自动编码器来学习不同类型生物数据之间的特征表示。采用多组学对比学习,最大化不同类型组学之间的互信息。此外,使用特征级自注意力和组学级自注意力来动态识别最有助于多组学数据整合的信息特征。最后,使用 Softmax 分类器进行多组学数据分类。在四个公共多组学数据集上进行了广泛的实验。实验结果表明,我们提出的 CLCLSA 在使用完整和不完整的多组学数据进行多组学数据分类时产生了有希望的结果。

相似文献

本文引用的文献

3
Dual Contrastive Prediction for Incomplete Multi-View Representation Learning.用于不完整多视图表示学习的双对比预测
IEEE Trans Pattern Anal Mach Intell. 2023 Apr;45(4):4447-4461. doi: 10.1109/TPAMI.2022.3197238. Epub 2023 Mar 7.
4
Trusted Multi-View Classification With Dynamic Evidential Fusion.基于动态证据融合的可信多视图分类
IEEE Trans Pattern Anal Mach Intell. 2023 Feb;45(2):2551-2566. doi: 10.1109/TPAMI.2022.3171983. Epub 2023 Jan 6.
8
Multi-omics integration in the age of million single-cell data.多组学整合在百万单细胞数据时代。
Nat Rev Nephrol. 2021 Nov;17(11):710-724. doi: 10.1038/s41581-021-00463-x. Epub 2021 Aug 20.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验