Fan Kunjie, Zhang Yan
Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, United States.
The Ohio State University Comprehensive Cancer Center, Columbus, OH, United States.
Front Genet. 2020 Aug 18;11:807. doi: 10.3389/fgene.2020.00807. eCollection 2020.
Pseudogenes are indicating more and more functional potentials recently, though historically were regarded as relics of evolution. Computational methods for predicting pseudogene functions on Gene Ontology is important for directing experimental discovery. However, no pseudogene-specific computational methods have been proposed to directly predict their Gene Ontology (GO) terms. The biggest challenge for pseudogene function prediction is the lack of enough features and functional annotations, making training a predictive model difficult. Considering the close functional similarity between pseudogenes and their parent coding genes that share great amount of DNA sequence, as well as that coding genes have rich annotations, we aim to predict pseudogene functions by borrowing information from coding genes in a graph-based way. Here we propose Pseudo2GO, a graph-based deep learning semi-supervised model for pseudogene function prediction. A sequence similarity graph is first constructed to connect pseudogenes and coding genes. Multiple features are incorporated into the model as the node attributes to enable the graph an attributed graph, including expression profiles, interactions with microRNAs, protein-protein interactions (PPIs), and genetic interactions. Graph convolutional networks are used to propagate node attributes across the graph to make classifications on pseudogenes. Comparing Pseudo2GO with other frameworks adapted from popular protein function prediction methods, we demonstrated that our method has achieved state-of-the-art performance, significantly outperforming other methods in terms of the M-AUPR metric.
假基因虽然在历史上被视为进化的遗迹,但近年来越来越显示出更多的功能潜力。基于基因本体论预测假基因功能的计算方法对于指导实验发现很重要。然而,尚未提出直接预测其基因本体论(GO)术语的假基因特异性计算方法。假基因功能预测面临的最大挑战是缺乏足够的特征和功能注释,这使得训练预测模型变得困难。考虑到假基因与其共享大量DNA序列的亲本编码基因之间密切的功能相似性,以及编码基因具有丰富的注释,我们旨在通过基于图的方式从编码基因中借用信息来预测假基因功能。在此,我们提出了Pseudo2GO,一种用于假基因功能预测的基于图的深度学习半监督模型。首先构建一个序列相似性图来连接假基因和编码基因。多个特征作为节点属性被纳入模型,以使该图成为一个属性图,包括表达谱、与微小RNA的相互作用、蛋白质-蛋白质相互作用(PPI)和遗传相互作用。图卷积网络用于在图中传播节点属性,以便对假基因进行分类。将Pseudo2GO与其他改编自流行蛋白质功能预测方法的框架进行比较,我们证明了我们的方法取得了领先的性能,在M-AUPR指标方面显著优于其他方法。