Faculty of Computing, Harbin Institute of Technology, 92 Xidazhi Street,TIB #20, Harbin, 150000, Hei Long Jiang, China.
School of medicine and Health, Harbin Institute of Technology, 92 Xidazhi Street,TIB #20, Harbin, 150000, Hei Long Jiang, China.
Comput Biol Med. 2022 Dec;151(Pt A):106263. doi: 10.1016/j.compbiomed.2022.106263. Epub 2022 Nov 9.
In recent years, with the gradual increase in pancancer-related research, more attention has been given to the field of pancancer metastasis. However, the molecular mechanism of pancancer metastasis is very unclear, and identification methods for pancancer metastasis-related genes are still lacking. In view of this research status, we developed a novel pipeline to identify pancancer metastasis-related genes based on compound constrained nonnegative matrix factorization (CCNMF). To solve the above problems, the following modules were designed. A correntropy operator and feature similarity fusion (FSF) were first adopted to process the multiomics features of genes; thus, the influences caused by irrelevant biomolecular patterns, manifested as non-Gaussian noise, were minimized. CCNMF was then adopted to handle the above features with compound constraints consisting of a gene relation network and a "metastasis-related" gene set, which maximizes the biological interpretability of the metafeatures generated by NMF. Since a negative set of pancancer "metastasis-related" genes could hardly be obtained, semisupervised analyses were performed on gene features acquired by each step in our pipeline to examine our method's effect. 83% of the 236 candidates identified by the above method were associated with the metastasis of one or more cancers, 71.9% candidates were identified immune-related in pancancer in addition to the hallmark genes. Our study provides an effective and interpretable method for identifying metastasis-related as well as immune-related genes, and the method is successfully applied to TCGA pancancer data.
近年来,随着泛癌相关研究的逐步增加,人们对泛癌转移领域的关注度越来越高。然而,泛癌转移的分子机制尚不清楚,缺乏识别泛癌转移相关基因的方法。鉴于这一研究现状,我们开发了一种新的基于复合约束非负矩阵分解(CCNMF)的识别泛癌转移相关基因的新方法。为了解决上述问题,设计了以下模块。首先采用相关熵算子和特征相似性融合(FSF)处理基因的多组学特征;从而最小化了无关生物分子模式(表现为非高斯噪声)造成的影响。然后采用 CCNMF 处理包含基因关系网络和“转移相关”基因集的复合约束的上述特征,使 NMF 生成的元特征的生物学可解释性最大化。由于很难获得泛癌“转移相关”基因的负集,因此对我们管道中每个步骤获得的基因特征进行半监督分析,以检验我们方法的效果。通过上述方法鉴定的 236 个候选基因中,有 83%与一种或多种癌症的转移有关,除了标志性基因外,71.9%的候选基因在泛癌中与免疫有关。我们的研究为识别转移相关和免疫相关基因提供了一种有效且可解释的方法,并成功应用于 TCGA 泛癌数据。