线性功能组织的组学嵌入空间。

Linear functional organization of the omic embedding space.

机构信息

Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain.

Universitat Politecnica de Catalunya (UPC), Barcelona 08034, Spain.

出版信息

Bioinformatics. 2021 Nov 5;37(21):3839-3847. doi: 10.1093/bioinformatics/btab487.

DOI:10.1093/bioinformatics/btab487

PMID:34213534

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8570782/

Abstract

MOTIVATION

We are increasingly accumulating complex omics data that capture different aspects of cellular functioning. A key challenge is to untangle their complexity and effectively mine them for new biomedical information. To decipher this new information, we introduce algorithms based on network embeddings. Such algorithms represent biological macromolecules as vectors in d-dimensional space, in which topologically similar molecules are embedded close in space and knowledge is extracted directly by vector operations. Recently, it has been shown that neural networks used to obtain vectorial representations (embeddings) are implicitly factorizing a mutual information matrix, called Positive Pointwise Mutual Information (PPMI) matrix. Thus, we propose the use of the PPMI matrix to represent the human protein-protein interaction (PPI) network and also introduce the graphlet degree vector PPMI matrix of the PPI network to capture different topological (structural) similarities of the nodes in the molecular network.

RESULTS

We generate the embeddings by decomposing these matrices with Nonnegative Matrix Tri-Factorization. We demonstrate that genes that are embedded close in these spaces have similar biological functions, so we can extract new biomedical knowledge directly by doing linear operations on their embedding vector representations. We exploit this property to predict new genes participating in protein complexes and to identify new cancer-related genes based on the cosine similarities between the vector representations of the genes. We validate 80% of our novel cancer-related gene predictions in the literature and also by patient survival curves that demonstrating that 93.3% of them have a potential clinical relevance as biomarkers of cancer.

AVAILABILITY AND IMPLEMENTATION

Code and data are available online at https://gitlab.bsc.es/axenos/embedded-omics-data-geometry/.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

我们越来越多地积累了捕获细胞功能不同方面的复杂组学数据。一个关键的挑战是理清它们的复杂性，并有效地从中挖掘新的生物医学信息。为了解码这些新信息，我们引入了基于网络嵌入的算法。这些算法将生物大分子表示为 d 维空间中的向量，其中拓扑相似的分子在空间上嵌入得很近，并且通过向量运算直接提取知识。最近，已经表明用于获得向量表示（嵌入）的神经网络隐含地分解了一个互信息矩阵，称为正点互信息（PPMI）矩阵。因此，我们提出使用 PPMI 矩阵来表示人类蛋白质-蛋白质相互作用（PPI）网络，并且还引入了 PPI 网络的图节度向量 PPMI 矩阵来捕获分子网络中节点的不同拓扑（结构）相似性。

结果

我们通过非负矩阵三因子分解来分解这些矩阵以生成嵌入。我们证明了在这些空间中嵌入接近的基因具有相似的生物学功能，因此我们可以通过对它们的嵌入向量表示进行线性运算来直接提取新的生物医学知识。我们利用这一特性来预测新的参与蛋白质复合物的基因，并根据基因的向量表示之间的余弦相似度来识别新的癌症相关基因。我们在文献中验证了 80%的新的癌症相关基因预测，并通过患者生存曲线验证了其中 93.3%的基因作为癌症生物标志物具有潜在的临床相关性。

可用性和实现

代码和数据可在 https://gitlab.bsc.es/axenos/embedded-omics-data-geometry/ 上在线获得。

补充信息

补充数据可在生物信息学在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c4c6/8570782/97c595eb4694/btab487f1.jpg

相似文献

Linear functional organization of the omic embedding space.线性功能组织的组学嵌入空间。

Bioinformatics. 2021 Nov 5;37(21):3839-3847. doi: 10.1093/bioinformatics/btab487.

A functional analysis of omic network embedding spaces reveals key altered functions in cancer.对组学网络嵌入空间的功能分析揭示了癌症中关键改变的功能。

Bioinformatics. 2023 May 4;39(5). doi: 10.1093/bioinformatics/btad281.

Text mining-based word representations for biomedical data analysis and protein-protein interaction networks in machine learning tasks.基于文本挖掘的词表示在生物医学数据分析和机器学习任务中的蛋白质-蛋白质相互作用网络。

PLoS One. 2021 Oct 15;16(10):e0258623. doi: 10.1371/journal.pone.0258623. eCollection 2021.

Identifying cellular cancer mechanisms through pathway-driven data integration.通过通路驱动的数据集成来识别细胞癌症机制。

Bioinformatics. 2022 Sep 15;38(18):4344-4351. doi: 10.1093/bioinformatics/btac493.

Minimum curvilinearity to enhance topological prediction of protein interactions by network embedding.最小曲率增强网络嵌入的蛋白质相互作用拓扑预测。

Bioinformatics. 2013 Jul 1;29(13):i199-209. doi: 10.1093/bioinformatics/btt208.

DPCMNE: Detecting Protein Complexes From Protein-Protein Interaction Networks Via Multi-Level Network Embedding.DPCMNE：通过多层次网络嵌入从蛋白质-蛋白质相互作用网络中检测蛋白质复合物。

IEEE/ACM Trans Comput Biol Bioinform. 2022 May-Jun;19(3):1592-1602. doi: 10.1109/TCBB.2021.3050102. Epub 2022 Jun 3.

Protein complexes identification based on go attributed network embedding.基于 GO 属性网络嵌入的蛋白质复合物识别。

BMC Bioinformatics. 2018 Dec 20;19(1):535. doi: 10.1186/s12859-018-2555-x.

Unsupervised construction of computational graphs for gene expression data with explicit structural inductive biases.无监督构建具有显式结构归纳偏差的基因表达数据的计算图。

Bioinformatics. 2022 Feb 7;38(5):1320-1327. doi: 10.1093/bioinformatics/btab830.

GLIDE: combining local methods and diffusion state embeddings to predict missing interactions in biological networks.GLIDE：将局部方法和扩散状态嵌入相结合，以预测生物网络中缺失的相互作用。

Bioinformatics. 2020 Jul 1;36(Suppl_1):i464-i473. doi: 10.1093/bioinformatics/btaa459.

L-GRAAL: Lagrangian graphlet-based network aligner.L-GRAAL：基于拉格朗日图元的网络对齐工具。

Bioinformatics. 2015 Jul 1;31(13):2182-9. doi: 10.1093/bioinformatics/btv130. Epub 2015 Feb 28.

引用本文的文献

Pathway Analysis Interpretation in the Multi-Omic Era.多组学时代的通路分析解读

BioTech (Basel). 2025 Jul 29;14(3):58. doi: 10.3390/biotech14030058.

Simplicity within biological complexity.生物复杂性中的简单性。

Bioinform Adv. 2025 Feb 6;5(1):vbae164. doi: 10.1093/bioadv/vbae164. eCollection 2025.

Interpreting and visualizing pathway analyses using embedding representations with PAVER.使用PAVER的嵌入表示法解释和可视化通路分析。

Bioinformation. 2024 Jul 31;20(7):700-704. doi: 10.6026/973206300200700. eCollection 2024.

Current and future directions in network biology.网络生物学的当前与未来发展方向。

Bioinform Adv. 2024 Aug 14;4(1):vbae099. doi: 10.1093/bioadv/vbae099. eCollection 2024.

The axes of biology: a novel axes-based network embedding paradigm to decipher the functional mechanisms of the cell.生物学的轴：一种基于轴的新型网络嵌入范式，用于解读细胞的功能机制。

Bioinform Adv. 2024 May 23;4(1):vbae075. doi: 10.1093/bioadv/vbae075. eCollection 2024.

A functional analysis of omic network embedding spaces reveals key altered functions in cancer.对组学网络嵌入空间的功能分析揭示了癌症中关键改变的功能。

Bioinformatics. 2023 May 4;39(5). doi: 10.1093/bioinformatics/btad281.

本文引用的文献

Prediction of cancer driver genes through network-based moment propagation of mutation scores.通过基于网络的突变分数矩传播预测癌症驱动基因。

Bioinformatics. 2020 Jul 1;36(Suppl_1):i508-i515. doi: 10.1093/bioinformatics/btaa452.

JAMIA Open. 2018 May 14;1(1):75-86. doi: 10.1093/jamiaopen/ooy008. eCollection 2018 Jul.

TOX reinforces the phenotype and longevity of exhausted T cells in chronic viral infection.TOX 增强慢性病毒感染中耗竭 T 细胞的表型和寿命。

Nature. 2019 Jul;571(7764):265-269. doi: 10.1038/s41586-019-1326-9. Epub 2019 Jun 17.

Towards a data-integrated cell.迈向数据整合细胞。

Nat Commun. 2019 Feb 18;10(1):805. doi: 10.1038/s41467-019-08797-8.

Network embedding in biomedical data science.生物医学数据科学中的网络嵌入

Brief Bioinform. 2020 Jan 17;21(1):182-197. doi: 10.1093/bib/bby117.

The BioGRID interaction database: 2019 update.生物相互作用数据库（BioGRID）：2019 年更新版。

Nucleic Acids Res. 2019 Jan 8;47(D1):D529-D541. doi: 10.1093/nar/gky1079.

CORUM: the comprehensive resource of mammalian protein complexes-2019.CORUM：哺乳动物蛋白质复合物综合资源-2019 年版。

Nucleic Acids Res. 2019 Jan 8;47(D1):D559-D563. doi: 10.1093/nar/gky973.

deepNF: deep network fusion for protein function prediction.深度网络融合的蛋白质功能预测。

Bioinformatics. 2018 Nov 15;34(22):3873-3881. doi: 10.1093/bioinformatics/bty440.

Machine learning meets complex networks via coalescent embedding in the hyperbolic space.机器学习通过在双曲空间中的合并嵌入与复杂网络相遇。

Nat Commun. 2017 Nov 20;8(1):1615. doi: 10.1038/s41467-017-01825-5.

The Hidden Flow Structure and Metric Space of Network Embedding Algorithms Based on Random Walks.基于随机游走的网络嵌入算法的隐藏流结构与度量空间

Sci Rep. 2017 Oct 13;7(1):13114. doi: 10.1038/s41598-017-12586-y.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

线性功能组织的组学嵌入空间。

Linear functional organization of the omic embedding space.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

补充信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献