Zhang Heming, Cao Dekang, Chen Zirui, Zhang Xiuyuan, Chen Yixin, Sessions Cole, Cruchaga Carlos, Payne Philip, Li Guangfu, Province Michael, Li Fuhai
Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine, Saint Louis, MO 63110-1010, United States.
Department of Computer Science and Engineering, Washington University in St. Louis, Saint Louis, MO 63130, United States.
Bioinform Adv. 2024 Oct 8;4(1):vbae151. doi: 10.1093/bioadv/vbae151. eCollection 2024.
Multi-omics data, i.e. genomics, epigenomics, transcriptomics, proteomics, characterize cellular complex signaling systems from multi-level and multi-view and provide a holistic view of complex cellular signaling pathways. However, it remains challenging to integrate and interpret multi-omics data for mining critical biomarkers. Graph AI models have been widely used to analyze graph-structure datasets, and are ideal for integrative multi-omics data analysis because they can naturally integrate and represent multi-omics data as a biologically meaningful multi-level signaling graph and interpret multi-omics data via graph node and edge ranking analysis. Nevertheless, it is nontrivial for graph-AI model developers to pre-analyze multi-omics data and convert the data into biologically meaningful graphs, which can be directly fed into graph-AI models.
To resolve this challenge, we developed mosGraphGen (multi-omics signaling graph generator), generating Multi-omics Signaling graphs (mos-graph) of individual samples by mapping multi-omics data onto a biologically meaningful multi-level background signaling network with data normalization by aggregating measurements and aligning to the reference genome. With mosGraphGen, AI model developers can directly apply and evaluate their models using these mos-graphs. In the results, mosGraphGen was used and illustrated using two widely used multi-omics datasets of The Cancer Genome Atlas (TCGA) and Alzheimer's disease (AD) samples.
The code of mosGraphGen is open-source and publicly available via GitHub: https://github.com/FuhaiLiAiLab/mosGraphGen.
多组学数据,即基因组学、表观基因组学、转录组学、蛋白质组学,从多层次和多视角表征细胞复杂信号系统,并提供复杂细胞信号通路的整体视图。然而,整合和解释多组学数据以挖掘关键生物标志物仍然具有挑战性。图人工智能模型已被广泛用于分析图结构数据集,并且对于整合多组学数据分析是理想的,因为它们可以自然地将多组学数据整合并表示为具有生物学意义的多层次信号图,并通过图节点和边排名分析来解释多组学数据。尽管如此,对于图人工智能模型开发者来说,预先分析多组学数据并将其转换为具有生物学意义的图并非易事,而这些图可以直接输入图人工智能模型。
为了解决这一挑战,我们开发了mosGraphGen(多组学信号图生成器),通过将多组学数据映射到具有生物学意义的多层次背景信号网络上,并通过汇总测量值和与参考基因组比对进行数据归一化,生成单个样本的多组学信号图(mos图)。借助mosGraphGen,人工智能模型开发者可以直接使用这些mos图来应用和评估他们的模型。在结果中,我们使用了癌症基因组图谱(TCGA)和阿尔茨海默病(AD)样本的两个广泛使用的多组学数据集来使用和说明mosGraphGen。
mosGraphGen的代码是开源的,可通过GitHub公开获取:https://github.com/FuhaiLiAiLab/mosGraphGen。