Montreal Heart Institute, Research Center, Montreal H1T 1C8, Canada.
Faculty of Medicine, University of Montreal, Montreal H3T 1J4, Canada.
Bioinformatics. 2022 May 26;38(11):3051-3061. doi: 10.1093/bioinformatics/btac304.
There is a plethora of measures to evaluate functional similarity (FS) of genes based on their co-expression, protein-protein interactions and sequence similarity. These measures are typically derived from hand-engineered and application-specific metrics to quantify the degree of shared information between two genes using their Gene Ontology (GO) annotations.
We introduce deepSimDEF, a deep learning method to automatically learn FS estimation of gene pairs given a set of genes and their GO annotations. deepSimDEF's key novelty is its ability to learn low-dimensional embedding vector representations of GO terms and gene products and then calculate FS using these learned vectors. We show that deepSimDEF can predict the FS of new genes using their annotations: it outperformed all other FS measures by >5-10% on yeast and human reference datasets on protein-protein interactions, gene co-expression and sequence homology tasks. Thus, deepSimDEF offers a powerful and adaptable deep neural architecture that can benefit a wide range of problems in genomics and proteomics, and its architecture is flexible enough to support its extension to any organism.
Source code and data are available at https://github.com/ahmadpgh/deepSimDEF.
Supplementary data are available at Bioinformatics online.
有大量的方法可以根据基因的共表达、蛋白质-蛋白质相互作用和序列相似性来评估功能相似性 (FS)。这些方法通常是从基于人工和特定于应用的指标中得出的,用于使用基因的 GO 注释来量化两个基因之间共享信息的程度。
我们引入了 deepSimDEF,这是一种深度学习方法,用于在给定一组基因及其 GO 注释的情况下自动学习基因对的 FS 估计。deepSimDEF 的关键新颖之处在于它能够学习 GO 术语和基因产物的低维嵌入向量表示,然后使用这些学习到的向量计算 FS。我们表明,deepSimDEF 可以使用它们的注释来预测新基因的 FS:在蛋白质-蛋白质相互作用、基因共表达和序列同源性任务上,它在酵母和人类参考数据集上均优于所有其他 FS 度量,高出 5-10%。因此,deepSimDEF 提供了一种强大且适应性强的深度神经网络架构,可以使基因组学和蛋白质组学中的广泛问题受益,并且其架构足够灵活,可以支持将其扩展到任何生物体。
源代码和数据可在 https://github.com/ahmadpgh/deepSimDEF 上获得。
补充数据可在 Bioinformatics 在线获得。