Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China.
Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650050, China.
Genes (Basel). 2020 Jan 31;11(2):153. doi: 10.3390/genes11020153.
Essential genes are a group of genes that are indispensable for cell survival and cell fertility. Studying human essential genes helps scientists reveal the underlying biological mechanisms of a human cell but also guides disease treatment. Recently, the publication of human essential gene data makes it possible for researchers to train a machine-learning classifier by using some features of the known human essential genes and to use the classifier to predict new human essential genes. Previous studies have found that the essentiality of genes closely relates to their properties in the protein-protein interaction (PPI) network. In this work, we propose a novel supervised method to predict human essential genes by network embedding the PPI network. Our approach implements a bias random walk on the network to get the node network context. Then, the node pairs are input into an artificial neural network to learn their representation vectors that maximally preserves network structure and the properties of the nodes in the network. Finally, the features are put into an SVM classifier to predict human essential genes. The prediction results on two human PPI networks show that our method achieves better performance than those that refer to either genes' sequence information or genes' centrality properties in the network as input features. Moreover, it also outperforms the methods that represent the PPI network by other previous approaches.
必需基因是一组对于细胞存活和细胞生育力不可或缺的基因。研究人类必需基因有助于科学家揭示人类细胞的基本生物学机制,也为疾病治疗提供指导。最近,人类必需基因数据的公布使得研究人员可以通过使用一些已知人类必需基因的特征来训练机器学习分类器,并使用该分类器来预测新的人类必需基因。先前的研究发现,基因的必需性与其在蛋白质-蛋白质相互作用(PPI)网络中的特性密切相关。在这项工作中,我们提出了一种通过网络嵌入 PPI 网络来预测人类必需基因的新的有监督方法。我们的方法在网络上实现了有偏差的随机游走,以获取节点网络上下文。然后,将节点对输入到人工神经网络中,以学习其表示向量,最大程度地保留网络结构和网络中节点的特性。最后,将特征放入 SVM 分类器中以预测人类必需基因。在两个人类 PPI 网络上的预测结果表明,我们的方法比那些将基因序列信息或网络中基因中心性特性作为输入特征的方法具有更好的性能。此外,它也优于通过其他先前方法表示 PPI 网络的方法。