Liu Zhe, Park Taesung
Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea.
Department of Statistics, Seoul National University, Seoul, Republic of Korea.
Front Genet. 2024 Dec 10;15:1488683. doi: 10.3389/fgene.2024.1488683. eCollection 2024.
Multi-omics data integration has become increasingly crucial for a deeper understanding of the complexity of biological systems. However, effectively integrating and analyzing multi-omics data remains challenging due to their heterogeneity and high dimensionality. Existing methods often struggle with noise, redundant features, and the complex interactions between different omics layers, leading to suboptimal performance. Additionally, they face difficulties in adequately capturing intra-omics interactions due to simplistic concatenation techiniques, and they risk losing critical inter-omics interaction information when using hierarchical attention layers. To address these challenges, we propose a novel Denoised Multi-Omics Integration approach that leverages the Transformer multi-head self-attention mechanism (DMOIT). DMOIT consists of three key modules: a generative adversarial imputation network for handling missing values, a sampling-based robust feature selection module to reduce noise and redundant features, and a multi-head self-attention (MHSA) based feature extractor with a noval architecture that enchance the intra-omics interaction capture. We validated model porformance using cancer datasets from the Cancer Genome Atlas (TCGA), conducting two tasks: survival time classification across different cancer types and estrogen receptor status classification for breast cancer. Our results show that DMOIT outperforms traditional machine learning methods and the state-of-the-art integration method MoGCN in terms of accuracy and weighted F1 score. Furthermore, we compared DMOIT with various alternative MHSA-based architectures to further validate our approach. Our results show that DMOIT consistently outperforms these models across various cancer types and different omics combinations. The strong performance and robustness of DMOIT demonstrate its potential as a valuable tool for integrating multi-omics data across various applications.
多组学数据整合对于更深入地理解生物系统的复杂性变得越来越关键。然而,由于多组学数据的异质性和高维度,有效地整合和分析这些数据仍然具有挑战性。现有方法往往难以处理噪声、冗余特征以及不同组学层之间的复杂相互作用,导致性能欠佳。此外,由于采用简单的拼接技术,它们在充分捕捉组学内部相互作用方面面临困难,并且在使用分层注意力层时存在丢失关键组学间相互作用信息的风险。为了应对这些挑战,我们提出了一种新颖的去噪多组学整合方法,即利用Transformer多头自注意力机制的DMOIT。DMOIT由三个关键模块组成:一个用于处理缺失值的生成对抗插补网络、一个基于采样的鲁棒特征选择模块以减少噪声和冗余特征,以及一个具有新颖架构的基于多头自注意力(MHSA)的特征提取器,该架构增强了对组学内部相互作用的捕捉。我们使用来自癌症基因组图谱(TCGA)的癌症数据集验证了模型性能,进行了两项任务:不同癌症类型的生存时间分类和乳腺癌的雌激素受体状态分类。我们的结果表明,在准确性和加权F1分数方面,DMOIT优于传统机器学习方法和最先进的整合方法MoGCN。此外,我们将DMOIT与各种基于MHSA的替代架构进行了比较,以进一步验证我们的方法。我们的结果表明,在各种癌症类型和不同组学组合中,DMOIT始终优于这些模型。DMOIT的强大性能和稳健性证明了其作为跨各种应用整合多组学数据的有价值工具的潜力。