Liu Haijie, Hou Liping, Xu Shanhu, Li He, Chen Xiuju, Gao Juan, Wang Ziwen, Han Bo, Liu Xiaoli, Wan Shu
Department of Neurology, Xuanwu Hospital, Capital Medical University, Beijing, China.
Department of Clinical Laboratory, General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China.
Front Genet. 2021 Sep 1;12:728333. doi: 10.3389/fgene.2021.728333. eCollection 2021.
Cerebral ischemic stroke (IS) is a complex disease caused by multiple factors including vascular risk factors, genetic factors, and environment factors, which accentuates the difficulty in discovering corresponding disease-related genes. Identifying the genes associated with IS is critical for understanding the biological mechanism of IS, which would be significantly beneficial to the diagnosis and clinical treatment of cerebral IS. However, existing methods to predict IS-related genes are mainly based on the hypothesis of guilt-by-association (GBA). These methods cannot capture the global structure information of the whole protein-protein interaction (PPI) network. Inspired by the success of network representation learning (NRL) in the field of network analysis, we apply NRL to the discovery of disease-related genes and launch the framework to identify the disease-related genes of cerebral IS. The utilized framework contains three main parts: capturing the topological information of the PPI network with NRL, denoising the gene feature with the participation of a stacked autoencoder (SAE), and optimizing a support vector machine (SVM) classifier to identify IS-related genes. Superior to the existing methods on IS-related gene prediction, our framework presents more accurate results. The case study also shows that the proposed method can identify IS-related genes.
脑缺血性中风(IS)是一种由多种因素引起的复杂疾病,这些因素包括血管危险因素、遗传因素和环境因素,这加剧了发现相应疾病相关基因的难度。识别与IS相关的基因对于理解IS的生物学机制至关重要,这将对脑IS的诊断和临床治疗具有显著益处。然而,现有的预测IS相关基因的方法主要基于关联有罪假设(GBA)。这些方法无法捕捉整个蛋白质-蛋白质相互作用(PPI)网络的全局结构信息。受网络表示学习(NRL)在网络分析领域成功的启发,我们将NRL应用于疾病相关基因的发现,并推出了识别脑IS疾病相关基因的框架。所使用的框架包含三个主要部分:用NRL捕捉PPI网络的拓扑信息,在堆叠自编码器(SAE)的参与下对基因特征进行去噪,以及优化支持向量机(SVM)分类器以识别IS相关基因。优于现有IS相关基因预测方法,我们的框架呈现出更准确的结果。案例研究还表明,所提出的方法可以识别IS相关基因。