Suppr超能文献

线性功能组织的组学嵌入空间。

Linear functional organization of the omic embedding space.

机构信息

Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain.

Universitat Politecnica de Catalunya (UPC), Barcelona 08034, Spain.

出版信息

Bioinformatics. 2021 Nov 5;37(21):3839-3847. doi: 10.1093/bioinformatics/btab487.

Abstract

MOTIVATION

We are increasingly accumulating complex omics data that capture different aspects of cellular functioning. A key challenge is to untangle their complexity and effectively mine them for new biomedical information. To decipher this new information, we introduce algorithms based on network embeddings. Such algorithms represent biological macromolecules as vectors in d-dimensional space, in which topologically similar molecules are embedded close in space and knowledge is extracted directly by vector operations. Recently, it has been shown that neural networks used to obtain vectorial representations (embeddings) are implicitly factorizing a mutual information matrix, called Positive Pointwise Mutual Information (PPMI) matrix. Thus, we propose the use of the PPMI matrix to represent the human protein-protein interaction (PPI) network and also introduce the graphlet degree vector PPMI matrix of the PPI network to capture different topological (structural) similarities of the nodes in the molecular network.

RESULTS

We generate the embeddings by decomposing these matrices with Nonnegative Matrix Tri-Factorization. We demonstrate that genes that are embedded close in these spaces have similar biological functions, so we can extract new biomedical knowledge directly by doing linear operations on their embedding vector representations. We exploit this property to predict new genes participating in protein complexes and to identify new cancer-related genes based on the cosine similarities between the vector representations of the genes. We validate 80% of our novel cancer-related gene predictions in the literature and also by patient survival curves that demonstrating that 93.3% of them have a potential clinical relevance as biomarkers of cancer.

AVAILABILITY AND IMPLEMENTATION

Code and data are available online at https://gitlab.bsc.es/axenos/embedded-omics-data-geometry/.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

我们越来越多地积累了捕获细胞功能不同方面的复杂组学数据。一个关键的挑战是理清它们的复杂性,并有效地从中挖掘新的生物医学信息。为了解码这些新信息,我们引入了基于网络嵌入的算法。这些算法将生物大分子表示为 d 维空间中的向量,其中拓扑相似的分子在空间上嵌入得很近,并且通过向量运算直接提取知识。最近,已经表明用于获得向量表示(嵌入)的神经网络隐含地分解了一个互信息矩阵,称为正点互信息(PPMI)矩阵。因此,我们提出使用 PPMI 矩阵来表示人类蛋白质-蛋白质相互作用(PPI)网络,并且还引入了 PPI 网络的图节度向量 PPMI 矩阵来捕获分子网络中节点的不同拓扑(结构)相似性。

结果

我们通过非负矩阵三因子分解来分解这些矩阵以生成嵌入。我们证明了在这些空间中嵌入接近的基因具有相似的生物学功能,因此我们可以通过对它们的嵌入向量表示进行线性运算来直接提取新的生物医学知识。我们利用这一特性来预测新的参与蛋白质复合物的基因,并根据基因的向量表示之间的余弦相似度来识别新的癌症相关基因。我们在文献中验证了 80%的新的癌症相关基因预测,并通过患者生存曲线验证了其中 93.3%的基因作为癌症生物标志物具有潜在的临床相关性。

可用性和实现

代码和数据可在 https://gitlab.bsc.es/axenos/embedded-omics-data-geometry/ 上在线获得。

补充信息

补充数据可在生物信息学在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c4c6/8570782/97c595eb4694/btab487f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验