Wang Yuhan, Wang Zhikang, Yu Xuan, Wang Xiaoyu, Song Jiangning, Yu Dong-Jun, Ge Fang
School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China.
Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Wellington Rd, Clayton, Melbourne, VIC 3800, Australia.
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae658.
High-throughput sequencing methods have brought about a huge change in omics-based biomedical study. Integrating various omics data is possibly useful for identifying some correlations across data modalities, thus improving our understanding of the underlying biological mechanisms and complexity. Nevertheless, most existing graph-based feature extraction methods overlook the complementary information and correlations across modalities. Moreover, these methods tend to treat the features of each omics modality equally, which contradicts current biological principles. To solve these challenges, we introduce a novel approach for integrating multi-omics data termed Multi-Omics hypeRgraph integration nEtwork (MORE). MORE initially constructs a comprehensive hyperedge group by extensively investigating the informative correlations within and across modalities. Subsequently, the multi-omics hypergraph encoding module is employed to learn the enriched omics-specific information. Afterward, the multi-omics self-attention mechanism is then utilized to adaptatively aggregate valuable correlations across modalities for representation learning and making the final prediction. We assess MORE's performance on datasets characterized by message RNA (mRNA) expression, Deoxyribonucleic Acid (DNA) methylation, and microRNA (miRNA) expression for Alzheimer's disease, invasive breast carcinoma, and glioblastoma. The results from three classification tasks highlight the competitive advantage of MORE in contrast with current state-of-the-art (SOTA) methods. Moreover, the results also show that MORE has the capability to identify a greater variety of disease-related biomarkers compared to existing methods, highlighting its advantages in biomedical data mining and interpretation. Overall, MORE can be investigated as a valuable tool for facilitating multi-omics analysis and novel biomarker discovery. Our code and data can be publicly accessed at https://github.com/Wangyuhanxx/MORE.
高通量测序方法给基于组学的生物医学研究带来了巨大变革。整合各种组学数据可能有助于识别不同数据模式之间的一些相关性,从而增进我们对潜在生物学机制和复杂性的理解。然而,大多数现有的基于图的特征提取方法忽略了跨模式的互补信息和相关性。此外,这些方法往往平等对待每个组学模式的特征,这与当前的生物学原理相矛盾。为了解决这些挑战,我们引入了一种用于整合多组学数据的新方法,称为多组学超图集成网络(MORE)。MORE首先通过广泛研究模式内部和模式之间的信息相关性来构建一个全面的超边组。随后,使用多组学超图编码模块来学习丰富的组学特定信息。之后,利用多组学自注意力机制来自适应地聚合跨模式的有价值相关性,用于表示学习并做出最终预测。我们在以阿尔茨海默病、浸润性乳腺癌和胶质母细胞瘤的信使核糖核酸(mRNA)表达、脱氧核糖核酸(DNA)甲基化和微小核糖核酸(miRNA)表达为特征的数据集上评估了MORE的性能。三项分类任务的结果突出了MORE相对于当前最先进(SOTA)方法的竞争优势。此外,结果还表明,与现有方法相比,MORE有能力识别更多种类的疾病相关生物标志物,突出了其在生物医学数据挖掘和解释方面的优势。总体而言,MORE可以作为促进多组学分析和发现新型生物标志物的有价值工具进行研究。我们的代码和数据可在https://github.com/Wangyuhanxx/MORE上公开获取。