Department of Life Science, Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain.
Department of Computer Science, University College London, London WC1E 6BT, United Kingdom.
Bioinformatics. 2023 May 4;39(5). doi: 10.1093/bioinformatics/btad281.
Advances in omics technologies have revolutionized cancer research by producing massive datasets. Common approaches to deciphering these complex data are by embedding algorithms of molecular interaction networks. These algorithms find a low-dimensional space in which similarities between the network nodes are best preserved. Currently available embedding approaches mine the gene embeddings directly to uncover new cancer-related knowledge. However, these gene-centric approaches produce incomplete knowledge, since they do not account for the functional implications of genomic alterations. We propose a new, function-centric perspective and approach, to complement the knowledge obtained from omic data.
We introduce our Functional Mapping Matrix (FMM) to explore the functional organization of different tissue-specific and species-specific embedding spaces generated by a Non-negative Matrix Tri-Factorization algorithm. Also, we use our FMM to define the optimal dimensionality of these molecular interaction network embedding spaces. For this optimal dimensionality, we compare the FMMs of the most prevalent cancers in human to FMMs of their corresponding control tissues. We find that cancer alters the positions in the embedding space of cancer-related functions, while it keeps the positions of the noncancer-related ones. We exploit this spacial 'movement' to predict novel cancer-related functions. Finally, we predict novel cancer-related genes that the currently available methods for gene-centric analyses cannot identify; we validate these predictions by literature curation and retrospective analyses of patient survival data.
Data and source code can be accessed at https://github.com/gaiac/FMM.
组学技术的进步通过产生大量数据集彻底改变了癌症研究。破译这些复杂数据的常见方法是嵌入分子相互作用网络的算法。这些算法在低维空间中找到网络节点之间相似度得到最佳保留的位置。当前可用的嵌入方法直接挖掘基因嵌入以发现新的与癌症相关的知识。然而,这些基于基因的方法产生了不完整的知识,因为它们没有考虑基因组改变的功能影响。我们提出了一种新的、以功能为中心的观点和方法,以补充从组学数据中获得的知识。
我们引入了我们的功能映射矩阵(FMM)来探索非负矩阵三因子分解算法生成的不同组织特异性和物种特异性嵌入空间的功能组织。此外,我们使用我们的 FMM 来定义这些分子相互作用网络嵌入空间的最佳维度。对于这个最佳维度,我们将最常见的人类癌症的 FMM 与它们相应的对照组织的 FMM 进行比较。我们发现癌症改变了与癌症相关功能的嵌入空间中的位置,而保持了与非癌症相关功能的位置。我们利用这种空间“运动”来预测新的与癌症相关的功能。最后,我们预测了目前基于基因的分析方法无法识别的新的与癌症相关的基因;我们通过文献整理和对患者生存数据的回顾性分析来验证这些预测。
数据和源代码可在 https://github.com/gaiac/FMM 上访问。