IEEE J Biomed Health Inform. 2024 Nov;28(11):6405-6416. doi: 10.1109/JBHI.2023.3309842. Epub 2024 Nov 6.
Gene expression data can serve for analyzing the genes with changed expressions, the correlation between genes and the influence of different circumstance on gene activities. However, labeling a large number of gene expression data is laborious and time-consuming. The insufficient labeled data pose a challenge to construct the deep learning model. Currently, some graph neural networks (GNN) based on semi-supervised learning mechanism only focus on the feature space and sample space of gene expression data, possibly affecting the accuracy. This article puts forward a novel semi-supervised graph neural network model (SFWN). Firstly, we use the external knowledge of gene expression data for constructing a feature graph, a similarity kernel, and a sample graph for the first time. Later, a novel semi-supervised learning algorithm (SGA) is proposed to extract the data relationship and obtain the global sample structure better. A graph sparse module (SGCN) is also proposed to process sparse representation with gene expression data classification. To overcome the over smoothing problem, a new feature calculation method based on two spaces is proposed to feature representation analysis and calculation in this model. According to a lot of experiments and ablation studies conducted on several public datasets, SFWN exhibits a better effect and is superior to the state-of-the-art approaches (the accuracy and F1-Score are 0.9993 and 0.9899, respectively). Experimental results showed that the proposed SFWN model has strong gene expression feature learning and representation ability, and may provide a new insight and tool for relevant disease diagnosis and clinic practice.
基因表达数据可用于分析表达发生变化的基因、基因之间的相关性以及不同环境对基因活性的影响。然而,对大量基因表达数据进行标记既费力又耗时。标记不足的基因表达数据给构建深度学习模型带来了挑战。目前,一些基于半监督学习机制的图神经网络(GNN)仅关注基因表达数据的特征空间和样本空间,这可能会影响模型的准确性。本文提出了一种新颖的半监督图神经网络模型(SFWN)。首先,我们首次使用基因表达数据的外部知识构建特征图、相似性核和样本图。随后,提出了一种新颖的半监督学习算法(SGA),以更好地提取数据关系并获取全局样本结构。此外,还提出了一个图稀疏模块(SGCN),用于处理基因表达数据分类的稀疏表示。为了克服过平滑问题,该模型提出了一种新的基于两个空间的特征计算方法,用于特征表示分析和计算。通过在多个公共数据集上进行大量实验和消融研究,SFWN 表现出更好的效果,优于最先进的方法(准确性和 F1-Score 分别为 0.9993 和 0.9899)。实验结果表明,所提出的 SFWN 模型具有较强的基因表达特征学习和表示能力,可能为相关疾病诊断和临床实践提供新的思路和工具。