LASIGE, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal.
J Biomed Semantics. 2023 Aug 14;14(1):11. doi: 10.1186/s13326-023-00291-x.
Predicting gene-disease associations typically requires exploring diverse sources of information as well as sophisticated computational approaches. Knowledge graph embeddings can help tackle these challenges by creating representations of genes and diseases based on the scientific knowledge described in ontologies, which can then be explored by machine learning algorithms. However, state-of-the-art knowledge graph embeddings are produced over a single ontology or multiple but disconnected ones, ignoring the impact that considering multiple interconnected domains can have on complex tasks such as gene-disease association prediction.
We propose a novel approach to predict gene-disease associations using rich semantic representations based on knowledge graph embeddings over multiple ontologies linked by logical definitions and compound ontology mappings. The experiments showed that considering richer knowledge graphs significantly improves gene-disease prediction and that different knowledge graph embeddings methods benefit more from distinct types of semantic richness.
This work demonstrated the potential for knowledge graph embeddings across multiple and interconnected biomedical ontologies to support gene-disease prediction. It also paved the way for considering other ontologies or tackling other tasks where multiple perspectives over the data can be beneficial. All software and data are freely available.
预测基因-疾病关联通常需要探索多种信息源和复杂的计算方法。知识图嵌入可以通过基于本体描述的科学知识为基因和疾病创建表示,然后通过机器学习算法进行探索,从而帮助解决这些挑战。然而,最先进的知识图嵌入是在单个本体或多个但不连接的本体上生成的,忽略了考虑多个互联领域对复杂任务(如基因-疾病关联预测)的影响。
我们提出了一种使用基于多个本体的知识图嵌入的丰富语义表示来预测基因-疾病关联的新方法,这些本体通过逻辑定义和复合本体映射链接在一起。实验表明,考虑更丰富的知识图谱可以显著提高基因-疾病预测的准确性,并且不同的知识图嵌入方法从不同类型的语义丰富度中获益更多。
这项工作证明了在多个互联的生物医学本体上使用知识图嵌入来支持基因-疾病预测的潜力。它还为考虑其他本体或处理其他可以从数据的多个角度受益的任务铺平了道路。所有软件和数据均可免费获得。